作者
Venkatachalam Anantharam, Pravin Varaiya, Jean Walrand
发表日期
1987/11
期刊
IEEE Transactions on Automatic Control
卷号
32
期号
11
页码范围
968-976
出版商
IEEE
简介
At each instant of time we are required to sample a fixed number out of i.i.d, processes whose distributions belong to a family suitably parameterized by a real number . The objective is to maximize the long run total expected value of the samples. Following Lai and Robbins, the learning loss of a sampling scheme corresponding to a configuration of parameters is quantified by the regret . This is the difference between the maximum expected reward at time that could be achieved if were known and the expected reward actually obtained by the sampling scheme. We provide a lower bound for the regret associated with any uniformly good scheme, and construct a scheme which attains the lower bound for every configuration . The lower bound is given explicitly in terms of the Kullback-Liebler number between pairs of distributions. Part II of this paper considers the same problem …
引用总数
19871988198919901991199219931994199519961997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024123311161112211262353211919271324271630224027221910