On the convergence and optimality of policy gradient for markov coherent risk

N Bäuerle, A Jaśkiewicz - Mathematical Methods of Operations Research, 2024 - Springer

The paper provides an overview of the theory and applications of risk-sensitive Markov
decision processes. The term'risk-sensitive'refers here to the use of the Optimized Certainty …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Robust risk-aware reinforcement learning

S Jaimungal, SM Pesenti, YS Wang, H Tatsat - SIAM Journal on Financial …, 2022 - SIAM

We present a reinforcement learning (RL) approach for robust optimization of risk-aware
performance criteria. To allow agents to express a wide variety of risk-reward profiles, we …

被引用次数：42 相关文章所有 10 个版本

[PDF] arxiv.org

Conditionally elicitable dynamic risk measures for deep reinforcement learning

A Coache, S Jaimungal, Á Cartea - SIAM Journal on Financial Mathematics, 2023 - SIAM

We propose a novel framework to solve risk-sensitive reinforcement learning problems
where the agent optimizes time-consistent dynamic spectral risk measures. Based on the …

被引用次数：22 相关文章所有 7 个版本

[PDF] wiley.com

Reinforcement learning with dynamic convex risk measures

A Coache, S Jaimungal - Mathematical Finance, 2024 - Wiley Online Library

We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization
problems using model‐free reinforcement learning (RL). Specifically, we assume agents …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

Risk-sensitive markov decision process and learning under general utility functions

Z Wu, R Xu - arXiv preprint arXiv:2311.13589, 2023 - arxiv.org

Reinforcement Learning (RL) has gained substantial attention across diverse application
domains and theoretical investigations. Existing literature on RL theory largely focuses on …

被引用次数：4 相关文章所有 3 个版本

[PDF] mlr.press

On the global convergence of risk-averse policy gradient methods with expected conditional risk measures

X Yu, L Ying - International Conference on Machine …, 2023 - proceedings.mlr.press

Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of
uncertain outcomes and ensure reliable performance in various sequential decision-making …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Sequential Decision-Making under Uncertainty: A Robust MDPs review

W Ou, S Bi - arXiv preprint arXiv:2404.00940, 2024 - arxiv.org

This review paper provides an in-depth overview of the evolution and advancements in
Robust Markov Decision Processes (RMDPs), a field of paramount importance for its role in …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

Y Luo, Y Pan, H Wang, P Torr, P Poupart - arXiv preprint arXiv:2403.11062, 2024 - arxiv.org

Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional
Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Markov Chain Variance Estimation: A Stochastic Approximation Approach

S Agrawal, ST Maguluri - arXiv preprint arXiv:2409.05733, 2024 - arxiv.org

We consider the problem of estimating the asymptotic variance of a function defined on a
Markov chain, an important step for statistical inference of the stationary mean. We design a …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Risk-Averse Finetuning of Large Language Models

S Chaudhary, U Dinesha, D Kalathil… - arXiv preprint arXiv …, 2025 - arxiv.org

We consider the challenge of mitigating the generation of negative or toxic content by the
Large Language Models (LLMs) in response to certain prompts. We propose integrating risk …

高级搜索

QQ 群