Mazumdar, Abhijit;
Wisniewski, Rafal;
Bujorianu, Manuela L;
(2023)
Online Learning of Safety function for Markov Decision Processes.
In:
2023 European Control Conference (ECC).
(pp. pp. 1-6).
IEEE: Bucharest, Romania.
Preview |
Text
Bujorianu_Final_draft_ECC.pdf - Accepted Version Download (424kB) | Preview |
Abstract
In this paper, we aim to study safety specifications for a Markov decision process with stochastic stopping time in an almost model-free setting. Our approach involves characterizing a proxy set of the states that are near in a probabilistic sense to the set of unsafe states - forbidden set. We also provide results that relate safety function with reinforcement learning. Consequently, we develop an online algorithm based on the temporal difference method to compute the safety function. Finally, we provide simulation results that demonstrate our work in a simple example.
Type: | Proceedings paper |
---|---|
Title: | Online Learning of Safety function for Markov Decision Processes |
Event: | 2023 European Control Conference (ECC) |
Dates: | 13 Jun 2023 - 16 Jun 2023 |
ISBN-13: | 978-3-907144-08-4 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.23919/ecc57647.2023.10178361 |
Publisher version: | http://dx.doi.org/10.23919/ecc57647.2023.10178361 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions. |
Keywords: | Markov decision processes, safety, online learning, temporal difference, proxy set |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10188533 |
Archive Staff Only
![]() |
View Item |