Near-minimax-optimal risk-sensitive reinforcement learning with cvar

K Wang, N Kallus, W Sun - International Conference on …, 2023 - proceedings.mlr.press
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …

Optimal thompson sampling strategies for support-aware cvar bandits

D Baudry, R Gautron, E Kaufmann… - … on Machine Learning, 2021 - proceedings.mlr.press
In this paper we study a multi-arm bandit problem in which the quality of each arm is
measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward …

Off-policy risk assessment in contextual bandits

A Huang, L Leqi, Z Lipton… - Advances in Neural …, 2021 - proceedings.neurips.cc
Even when unable to run experiments, practitioners can evaluate prospective policies, using
previously logged data. However, while the bandits literature has adopted a diverse set of …

[PDF][PDF] Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

LA Prashanth, K Jagannathan… - Proceedings of the 37th …, 2020 - proceedings.mlr.press
Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such
as finance. We derive concentration bounds for CVaR estimates, considering separately the …

Nearly optimal catoni's m-estimator for infinite variance

S Bhatt, G Fang, P Li… - … Conference on Machine …, 2022 - proceedings.mlr.press
In this paper, we extend the remarkable M-estimator of Catoni\citep {Cat12} to situations
where the variance is infinite. In particular, given a sequence of iid random variables $\{X_i\} …

Sample-based bounds for coherent risk measures: Applications to policy synthesis and verification

P Akella, A Dixit, M Ahmadi, JW Burdick, AD Ames - Artificial Intelligence, 2024 - Elsevier
Autonomous systems are increasingly used in highly variable and uncertain environments
giving rise to the pressing need to consider risk in both the synthesis and verification of …

Adaptive best-of-both-worlds algorithm for heavy-tailed multi-armed bandits

J Huang, Y Dai, L Huang - international conference on …, 2022 - proceedings.mlr.press
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …

Truthful user recruitment for cooperative crowdsensing task: A combinatorial multi-armed bandit approach

H Wang, Y Yang, E Wang, W Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Mobile Crowdsensing (MCS) is a promising paradigm that recruits users to cooperatively
perform a sensing task. When recruiting users, existing works mainly focus on selecting a …

[PDF][PDF] Optimal algorithms for stochastic multi-armed bandits with heavy tailed rewards

K Lee, H Yang, S Lim, S Oh - Advances in Neural …, 2020 - proceedings.neurips.cc
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Page 1
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Kyungjae Lee …

Optimal best-arm identification methods for tail-risk measures

S Agrawal, WM Koolen… - Advances in Neural …, 2021 - proceedings.neurips.cc
Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in
finance and insurance industries as well as in highly reliable, safety-critical uncertain …