Risk-constrained reinforcement learning with percentile risk criteria

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org
In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

Algorithms for CVaR optimization in MDPs

Y Chow, M Ghavamzadeh - Advances in neural information …, 2014 - proceedings.neurips.cc
In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in costs in addition to minimizing a standard criterion …

Risk-averse offline reinforcement learning

NA Urpí, S Curi, A Krause - arXiv preprint arXiv:2102.05371, 2021 - arxiv.org
Training Reinforcement Learning (RL) agents in high-stakes applications might be too
prohibitive due to the risk associated to exploration. Thus, the agent can only use data …

Policy gradient for coherent risk measures

A Tamar, Y Chow, M Ghavamzadeh… - Advances in neural …, 2015 - proceedings.neurips.cc
Several authors have recently developed risk-sensitive policy gradient methods that
augment the standard expected cost minimization problem with a measure of variability in …

Sequential decision making with coherent risk

A Tamar, Y Chow, M Ghavamzadeh… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
We provide sampling-based algorithms for optimization under a coherent-risk objective. The
class of coherent-risk measures is widely accepted in finance and operations research …

[PDF][PDF] Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models.

A Majumdar, S Singh… - … science and systems, 2017 - m.roboticsproceedings.org
The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take
actions in order to minimize the expected value of a cost function, ie, that humans are risk …

Bayesian robust optimization for imitation learning

D Brown, S Niekum, M Petrik - Advances in Neural …, 2020 - proceedings.neurips.cc
One of the main challenges in imitation learning is determining what action an agent should
take when outside the state distribution of the demonstrations. Inverse reinforcement …

Reinforcement learning with dynamic convex risk measures

A Coache, S Jaimungal - Mathematical Finance, 2024 - Wiley Online Library
We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization
problems using model‐free reinforcement learning (RL). Specifically, we assume agents …

Learning to optimize with stochastic dominance constraints

H Dai, Y Xue, N He, Y Wang, N Li… - International …, 2023 - proceedings.mlr.press
In real-world decision-making, uncertainty is important yet difficult to handle. Stochastic
dominance provides a theoretically sound approach to comparing uncertain quantities, but …