查看文章

researchgate.net 中的 [PDF]

Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards

作者

Venkatachalam Anantharam, Pravin Varaiya, Jean Walrand

发表日期

1987/11

期刊

IEEE Transactions on Automatic Control

卷号

期号

页码范围

968-976

出版商

IEEE

简介

At each instant of time we are required to sample a fixed number out of i.i.d, processes whose distributions belong to a family suitably parameterized by a real number . The objective is to maximize the long run total expected value of the samples. Following Lai and Robbins, the learning loss of a sampling scheme corresponding to a configuration of parameters is quantified by the regret . This is the difference between the maximum expected reward at time that could be achieved if were known and the expected reward actually obtained by the sampling scheme. We provide a lower bound for the regret associated with any uniformly good scheme, and construct a scheme which attains the lower bound for every configuration . The lower bound is given explicitly in terms of the Kullback-Liebler number between pairs of distributions. Part II of this paper considers the same problem …

引用总数

被引用次数：388

198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320241 2 3 3 1 1 1 6 1 1 1 2 2 1 1 2 6 2 3 5 3 21 19 19 27 13 24 27 16 30 22 40 27 22 19 10

学术搜索中的文章

Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards

V Anantharam, P Varaiya, J Walrand - IEEE Transactions on Automatic Control, 1987

被引用次数：388 相关文章所有 12 个版本