Wu, Shuang;
Shi, Ling;
Wang, Jun;
Tian, Guangjian;
(2022)
Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach.
In: Chaudhuri, K and Jegelka, S and Song, L and Szepesvari, C and Niu, G and Sabato, S, (eds.)
Proceedings of the 39th International Conference on Machine Learning.
(pp. pp. 24131-24149).
Proceedings of Machine Learning Research (PMLR): Baltimore, MD, USA.
Preview |
Text
wu22i.pdf - Published Version Download (703kB) | Preview |
Abstract
The REINFORCE algorithm from Williams is popular in policy gradient (PG) for solving reinforcement learning (RL) problems. Meanwhile, the theoretical form of PG is from Sutton et al. Although both formulae prescribe PG, their precise connections are not yet illustrated. Recently, Nota and Thomas (2020) have found that the ambiguity causes implementation errors. Motivated by the ambiguity and implementation incorrectness, we study PG from a perturbation perspective. In particular, we derive PG in a unified framework, precisely clarify the relation between PG implementation and theory, and echo back the findings by Nota and Thomas. Diving into factors contributing to empirical successes of the existing erroneous implementations, we find that small approximation error and the experience replay mechanism play critical roles.
Type: | Proceedings paper |
---|---|
Title: | Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach |
Event: | 39th International Conference on Machine Learning |
Location: | Baltimore, MD |
Dates: | 17 Jul 2022 - 23 Jul 2022 |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://proceedings.mlr.press/v162/wu22i.html |
Language: | English |
Additional information: | This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10185806 |
Archive Staff Only
![]() |
View Item |