Jiang, Minqi Sebastian;
(2023)
Learning Curricula in Open-Ended Worlds.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Jiang_10176653_Thesis_corrected.pdf Download (15MB) | Preview |
Abstract
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real-world is highly open-ended—featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing across a large space of simulated environments is insufficient, as it requires making arbitrary distributional assumptions, and as the design space grows, it can become combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which seeks to enable such an open-ended process via a principled approach for gradually improving the robustness and generality of the learning agent. Given a potentially open-ended environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent’s capabilities. Through both extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that approach general intelligence—a long sought-after ambition of artificial intelligence research—by continually generating and mastering additional challenges of their own design.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Learning Curricula in Open-Ended Worlds |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10176653 |
Archive Staff Only
![]() |
View Item |