UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach

Wu, Shuang; Shi, Ling; Wang, Jun; Tian, Guangjian; (2022) Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach. In: Chaudhuri, K and Jegelka, S and Song, L and Szepesvari, C and Niu, G and Sabato, S, (eds.) Proceedings of the 39th International Conference on Machine Learning. (pp. pp. 24131-24149). Proceedings of Machine Learning Research (PMLR): Baltimore, MD, USA. Green open access

[thumbnail of wu22i.pdf]
Preview
Text
wu22i.pdf - Published Version

Download (703kB) | Preview

Abstract

The REINFORCE algorithm from Williams is popular in policy gradient (PG) for solving reinforcement learning (RL) problems. Meanwhile, the theoretical form of PG is from Sutton et al. Although both formulae prescribe PG, their precise connections are not yet illustrated. Recently, Nota and Thomas (2020) have found that the ambiguity causes implementation errors. Motivated by the ambiguity and implementation incorrectness, we study PG from a perturbation perspective. In particular, we derive PG in a unified framework, precisely clarify the relation between PG implementation and theory, and echo back the findings by Nota and Thomas. Diving into factors contributing to empirical successes of the existing erroneous implementations, we find that small approximation error and the experience replay mechanism play critical roles.

Type: Proceedings paper
Title: Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach
Event: 39th International Conference on Machine Learning
Location: Baltimore, MD
Dates: 17 Jul 2022 - 23 Jul 2022
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.mlr.press/v162/wu22i.html
Language: English
Additional information: This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10185806
Downloads since deposit
504Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item