查看文章

jmlr.org 中的 [PDF]

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.

作者

Eyal Even-Dar, Shie Mannor, Yishay Mansour, Sridhar Mahadevan

发表日期

2006/6/1

期刊

Journal of machine learning research

卷号

期号

简介

We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O ((n/ε2) log (1/δ)) times to find an ε-optimal arm with probability of at least 1− δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over ε-greedy Q-learning.

引用总数

被引用次数：763

20062007200820092010201120122013201420152016201720182019202020212022202320243 5 6 11 15 14 14 13 20 26 38 37 48 62 81 87 107 96 79

学术搜索中的文章

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems.

E Even-Dar, S Mannor, Y Mansour, S Mahadevan - Journal of machine learning research, 2006

被引用次数：763 相关文章所有 12 个版本