Policy gradient for rectangular robust markov decision processes

N Kumar, E Derman, M Geist… - Advances in Neural …, 2023 - proceedings.neurips.cc
Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …

Twice regularized MDPs and the equivalence between robustness and regularization

E Derman, M Geist, S Mannor - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Robust Markov decision processes (MDPs) aim to handle changing or partially
known system dynamics. To solve them, one typically resorts to robust optimization methods …

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

On the convex formulations of robust Markov decision processes

J Grand-Clément, M Petrik - Mathematics of Operations …, 2024 - pubsonline.informs.org
Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many of the …

Adversarial interpretation of Bayesian inference

H Husain, J Knoblauch - International Conference on …, 2022 - proceedings.mlr.press
We build on the optimization-centric view on Bayesian inference advocated by Knoblauch et
al.(2019). Thinking about Bayesian and generalized Bayesian posteriors as the solutions to …

Solving non-rectangular reward-robust MDPs via frequency regularization

U Gadot, E Derman, N Kumar, MM Elfatihi… - Proceedings of the …, 2024 - ojs.aaai.org
In robust Markov decision processes (RMDPs), it is assumed that the reward and the
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …

Roping in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q Xie - arXiv preprint arXiv:2406.08847, 2024 - arxiv.org
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice

T Kitamura, T Kozuno, Y Tang… - International …, 2023 - proceedings.mlr.press
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-
regularized reinforcement learning (RL), has served as the basis for recent high-performing …

Model-free risk-sensitive reinforcement learning

G Delétang, J Grau-Moya, M Kunesch… - arXiv preprint arXiv …, 2021 - arxiv.org
We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free
reinforcement learning algorithms. This extension can be regarded as modification of the …

Robust reinforcement learning in continuous control tasks with uncertainty set regularization

Y Zhang, J Wang, J Boedecker - Conference on Robot …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) is recognized as lacking generalization and robustness under
environmental perturbations, which excessively restricts its application for real-world …