Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.

K Wang, N Kallus, W Sun - International Conference on …, 2023 - proceedings.mlr.press

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …

被引用次数：15 相关文章所有 8 个版本

[PDF] mlr.press

Optimal thompson sampling strategies for support-aware cvar bandits

D Baudry, R Gautron, E Kaufmann… - … on Machine Learning, 2021 - proceedings.mlr.press

In this paper we study a multi-arm bandit problem in which the quality of each arm is
measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward …

被引用次数：41 相关文章所有 12 个版本

[PDF] neurips.cc

Off-policy risk assessment in contextual bandits

A Huang, L Leqi, Z Lipton… - Advances in Neural …, 2021 - proceedings.neurips.cc

Even when unable to run experiments, practitioners can evaluate prospective policies, using
previously logged data. However, while the bandits literature has adopted a diverse set of …

被引用次数：35 相关文章所有 7 个版本

[PDF] mlr.press

[PDF][PDF] Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

LA Prashanth, K Jagannathan… - Proceedings of the 37th …, 2020 - proceedings.mlr.press

Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such
as finance. We derive concentration bounds for CVaR estimates, considering separately the …

被引用次数：53 相关文章所有 5 个版本

[PDF] mlr.press

Nearly optimal catoni's m-estimator for infinite variance

S Bhatt, G Fang, P Li… - … Conference on Machine …, 2022 - proceedings.mlr.press

In this paper, we extend the remarkable M-estimator of Catoni\citep {Cat12} to situations
where the variance is infinite. In particular, given a sequence of iid random variables $\{X_i\} …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Sample-based bounds for coherent risk measures: Applications to policy synthesis and verification

P Akella, A Dixit, M Ahmadi, JW Burdick, AD Ames - Artificial Intelligence, 2024 - Elsevier

Autonomous systems are increasingly used in highly variable and uncertain environments
giving rise to the pressing need to consider risk in both the synthesis and verification of …

被引用次数：20 相关文章所有 5 个版本

[PDF] mlr.press

Adaptive best-of-both-worlds algorithm for heavy-tailed multi-armed bandits

J Huang, Y Dai, L Huang - international conference on …, 2022 - proceedings.mlr.press

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …

被引用次数：17 相关文章所有 3 个版本

Truthful user recruitment for cooperative crowdsensing task: A combinatorial multi-armed bandit approach

H Wang, Y Yang, E Wang, W Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Mobile Crowdsensing (MCS) is a promising paradigm that recruits users to cooperatively
perform a sensing task. When recruiting users, existing works mainly focus on selecting a …

被引用次数：23 相关文章所有 3 个版本

[PDF] neurips.cc

[PDF][PDF] Optimal algorithms for stochastic multi-armed bandits with heavy tailed rewards

K Lee, H Yang, S Lim, S Oh - Advances in Neural …, 2020 - proceedings.neurips.cc

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Page 1
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Kyungjae Lee …

被引用次数：26 相关文章所有 9 个版本

[PDF] neurips.cc

Optimal best-arm identification methods for tail-risk measures

S Agrawal, WM Koolen… - Advances in Neural …, 2021 - proceedings.neurips.cc

Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in
finance and insurance industries as well as in highly reliable, safety-critical uncertain …

被引用次数：21 相关文章所有 7 个版本

高级搜索

QQ 群