查看文章

lboro.ac.uk 中的 [PDF]

Risk-aware multi-armed bandits with refined upper confidence bounds

作者

Xingchi Liu, Mahsa Derakhshani, Sangarapillai Lambotharan, Mihaela Van der Schaar

发表日期

2020/12/28

期刊

IEEE Signal Processing Letters

卷号

页码范围

269-273

出版商

IEEE

简介

The classical multi-armed bandit (MAB) framework studies the exploration-exploitation dilemma of the decisionmaking problem and always treats the arm with the highest expected reward as the optimal choice. However, in some applications, an arm with a high expected reward can be risky to play if the variance is high. Hence, the variation of the reward should be considered to make the arm-selection process risk-aware. In this letter, the mean-variance metric is investigated to measure the uncertainty of the received rewards. We first study a risk-aware MAB problem when the reward follows a Gaussian distribution, and a concentration inequality on the variance is developed to design a Gaussian risk aware-upper confidence bound algorithm. Furthermore, we extend this algorithm to a novel asymptotic risk aware-upper confidence bound algorithm by developing an upper confidence bound of the variance based …

引用总数

被引用次数：15

20212022202320242 5 5 3

学术搜索中的文章

Risk-aware multi-armed bandits with refined upper confidence bounds

X Liu, M Derakhshani, S Lambotharan… - IEEE Signal Processing Letters, 2020

被引用次数：15 相关文章所有 3 个版本