作者
Pravin Varaiya, Jean Walrand, Cagatay Buyukkoc
发表日期
1985/5
期刊
IEEE transactions on automatic control
卷号
30
期号
5
页码范围
426-439
出版商
IEEE
简介
There are independent machines. Machine i is described by a sequence where is the immediate reward and F^{i}(s) is the information available before i is operated for the sth lime. At each time one operates exacfiy one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. An elementary proof shows that to each machine is associated an index, and the optimal policy operates the machine with the largest current index. When the machines are completely observed Markov chains, this coincides with the well-known Gittins index rule, and new algorithms are given for calculating the index. A reformulation of the bandit problem yields the tax problem, which includes, as a special case, Klimov's waiting time problem. Using the concept of superprocess, an index rule is derived for the case …
引用总数
198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202432610129126131113821166513556915109101371715812142104472
学术搜索中的文章
P Varaiya, J Walrand, C Buyukkoc - IEEE transactions on automatic control, 1985