An approximate solution method for large risk-averse Markov decision processes

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org

In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

被引用次数：585 相关文章所有 12 个版本

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

被引用次数：280 相关文章所有 9 个版本

[PDF] neurips.cc

Algorithms for CVaR optimization in MDPs

Y Chow, M Ghavamzadeh - Advances in neural information …, 2014 - proceedings.neurips.cc

In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in costs in addition to minimizing a standard criterion …

被引用次数：382 相关文章所有 11 个版本

[PDF] arxiv.org

Risk-averse offline reinforcement learning

NA Urpí, S Curi, A Krause - arXiv preprint arXiv:2102.05371, 2021 - arxiv.org

Training Reinforcement Learning (RL) agents in high-stakes applications might be too
prohibitive due to the risk associated to exploration. Thus, the agent can only use data …

被引用次数：82 相关文章所有 2 个版本

[PDF] neurips.cc

Policy gradient for coherent risk measures

A Tamar, Y Chow, M Ghavamzadeh… - Advances in neural …, 2015 - proceedings.neurips.cc

Several authors have recently developed risk-sensitive policy gradient methods that
augment the standard expected cost minimization problem with a measure of variability in …

被引用次数：145 相关文章所有 8 个版本

[PDF] github.io

Sequential decision making with coherent risk

A Tamar, Y Chow, M Ghavamzadeh… - IEEE transactions on …, 2016 - ieeexplore.ieee.org

We provide sampling-based algorithms for optimization under a coherent-risk objective. The
class of coherent-risk measures is widely accepted in finance and operations research …

被引用次数：82 相关文章所有 2 个版本

[PDF] roboticsproceedings.org

[PDF][PDF] Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models.

A Majumdar, S Singh… - … science and systems, 2017 - m.roboticsproceedings.org

The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take
actions in order to minimize the expected value of a cost function, ie, that humans are risk …

被引用次数：79 相关文章所有 7 个版本

[PDF] neurips.cc

Bayesian robust optimization for imitation learning

D Brown, S Niekum, M Petrik - Advances in Neural …, 2020 - proceedings.neurips.cc

One of the main challenges in imitation learning is determining what action an agent should
take when outside the state distribution of the demonstrations. Inverse reinforcement …

被引用次数：40 相关文章所有 7 个版本

[PDF] wiley.com

Reinforcement learning with dynamic convex risk measures

A Coache, S Jaimungal - Mathematical Finance, 2024 - Wiley Online Library

We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization
problems using model‐free reinforcement learning (RL). Specifically, we assume agents …

被引用次数：22 相关文章所有 5 个版本

[PDF] mlr.press

Learning to optimize with stochastic dominance constraints

H Dai, Y Xue, N He, Y Wang, N Li… - International …, 2023 - proceedings.mlr.press

In real-world decision-making, uncertainty is important yet difficult to handle. Stochastic
dominance provides a theoretically sound approach to comparing uncertain quantities, but …

被引用次数：7 相关文章所有 6 个版本

高级搜索

QQ 群