Core: Capitalizing on rewards in bandit exploration- 学术资源搜索

文章

学术资源搜索

我的图书馆

[PDF] mlr.press

Core: Capitalizing on rewards in bandit exploration

N Wang, B Kveton… - Uncertainty in Artificial …, 2021 - proceedings.mlr.press

N Wang, B Kveton, M Karimzadehgan

Uncertainty in Artificial Intelligence, 2021•proceedings.mlr.press

Abstract

We propose a bandit algorithm that explores purely by randomizing its past observations. In particular, the sufficient optimism in the mean reward estimates is achieved by exploiting the variance in the past observed rewards. We name the algorithm Capitalizing On Rewards (CORe). The algorithm is general and can be easily applied to different bandit settings. The main benefit of CORe is that its exploration is fully data-dependent. It does not rely on any external noise and adapts to different problems without parameter tuning. We derive a gap-free bound on the n-round regret of CORe in a stochastic linear bandit, where d is the number of features and K is the number of arms. Extensive empirical evaluation on multiple synthetic and real-world problems demonstrates the effectiveness of CORe.

proceedings.mlr.press

展开收起

被引用次数：2 相关文章所有 7 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Core: Capitalizing on rewards in bandit exploration

引用