Regularized policies are reward robust

N Kumar, E Derman, M Geist… - Advances in Neural …, 2023 - proceedings.neurips.cc

Policy gradient methods have become a standard for training reinforcement learning agents
in a scalable and efficient manner. However, they do not account for transition uncertainty …

被引用次数：32 相关文章所有 5 个版本

[PDF] neurips.cc

Twice regularized MDPs and the equivalence between robustness and regularization

E Derman, M Geist, S Mannor - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Robust Markov decision processes (MDPs) aim to handle changing or partially
known system dynamics. To solve them, one typically resorts to robust optimization methods …

被引用次数：47 相关文章所有 9 个版本

[PDF] arxiv.org

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

On the convex formulations of robust Markov decision processes

J Grand-Clément, M Petrik - Mathematics of Operations …, 2024 - pubsonline.informs.org

Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many of the …

被引用次数：13 相关文章所有 2 个版本

[PDF] mlr.press

Adversarial interpretation of Bayesian inference

H Husain, J Knoblauch - International Conference on …, 2022 - proceedings.mlr.press

We build on the optimization-centric view on Bayesian inference advocated by Knoblauch et
al.(2019). Thinking about Bayesian and generalized Bayesian posteriors as the solutions to …

被引用次数：11 相关文章所有 3 个版本

[PDF] aaai.org

Solving non-rectangular reward-robust MDPs via frequency regularization

U Gadot, E Derman, N Kumar, MM Elfatihi… - Proceedings of the …, 2024 - ojs.aaai.org

In robust Markov decision processes (RMDPs), it is assumed that the reward and the
transition dynamics lie in a given uncertainty set. By targeting maximal return under the most …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Roping in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q Xie - arXiv preprint arXiv:2406.08847, 2024 - arxiv.org

We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

被引用次数：3 相关文章所有 3 个版本

[PDF] mlr.press

Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice

T Kitamura, T Kozuno, Y Tang… - International …, 2023 - proceedings.mlr.press

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-
regularized reinforcement learning (RL), has served as the basis for recent high-performing …

被引用次数：2 相关文章所有 7 个版本

[PDF] arxiv.org

Model-free risk-sensitive reinforcement learning

G Delétang, J Grau-Moya, M Kunesch… - arXiv preprint arXiv …, 2021 - arxiv.org

We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free
reinforcement learning algorithms. This extension can be regarded as modification of the …

被引用次数：13 相关文章所有 2 个版本

[PDF] mlr.press

Robust reinforcement learning in continuous control tasks with uncertainty set regularization

Y Zhang, J Wang, J Boedecker - Conference on Robot …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) is recognized as lacking generalization and robustness under
environmental perturbations, which excessively restricts its application for real-world …

被引用次数：3 相关文章所有 5 个版本

高级搜索

QQ 群