作者
Venkatachalam Anantharam, Pravin Varaiya, Jean Walrand
发表日期
1987/11
期刊
IEEE Transactions on Automatic Control
卷号
32
期号
11
页码范围
968-976
出版商
IEEE
简介
At each instant of time we are required to sample a fixed number out of i.i.d, processes whose distributions belong to a family suitably parameterized by a real number . The objective is to maximize the long run total expected value of the samples. Following Lai and Robbins, the learning loss of a sampling scheme corresponding to a configuration of parameters is quantified by the regret . This is the difference between the maximum expected reward at time that could be achieved if were known and the expected reward actually obtained by the sampling scheme. We provide a lower bound for the regret associated with any uniformly good scheme, and construct a scheme which attains the lower bound for every configuration . The lower bound is given explicitly in terms of the Kullback-Liebler number between pairs of distributions. Part II of this paper considers the same problem …
引用总数
学术搜索中的文章
V Anantharam, P Varaiya, J Walrand - IEEE Transactions on Automatic Control, 1987