作者
Xingchi Liu, Mahsa Derakhshani, Sangarapillai Lambotharan, Mihaela Van der Schaar
发表日期
2020/12/28
期刊
IEEE Signal Processing Letters
卷号
28
页码范围
269-273
出版商
IEEE
简介
The classical multi-armed bandit (MAB) framework studies the exploration-exploitation dilemma of the decisionmaking problem and always treats the arm with the highest expected reward as the optimal choice. However, in some applications, an arm with a high expected reward can be risky to play if the variance is high. Hence, the variation of the reward should be considered to make the arm-selection process risk-aware. In this letter, the mean-variance metric is investigated to measure the uncertainty of the received rewards. We first study a risk-aware MAB problem when the reward follows a Gaussian distribution, and a concentration inequality on the variance is developed to design a Gaussian risk aware-upper confidence bound algorithm. Furthermore, we extend this algorithm to a novel asymptotic risk aware-upper confidence bound algorithm by developing an upper confidence bound of the variance based …
引用总数
20212022202320242553
学术搜索中的文章
X Liu, M Derakhshani, S Lambotharan… - IEEE Signal Processing Letters, 2020