Online Markov decision processes with non-oblivious strategic adversary

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Online Markov decision processes with non-oblivious strategic adversary

Dinh, LC; Mguni, DH; Tran-Thanh, L; Wang, J; Yang, Y; (2023) Online Markov decision processes with non-oblivious strategic adversary. Autonomous Agents and Multi-Agent Systems , 37 (1) , Article 15. 10.1007/s10458-023-09599-5. Green open access

Preview

Text
2110.03604.pdf - Accepted Version
Download (325kB) | Preview

Abstract

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O(Tlog(L)+τ2Tlog(|A|)) where L is the size of adversary’s pure strategy set and | A| denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O(Tlog(L)+τ2Tklog(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.

Type:	Article
Title:	Online Markov decision processes with non-oblivious strategic adversary
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1007/s10458-023-09599-5
Publisher version:	https://doi.org/10.1007/s10458-023-09599-5
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Multi-agent system, Game theory, Online learning, Online Markov decision processes, Non-oblivious adversary, Last round convergence
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10164563

Downloads since deposit

396Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item