查看文章

springer.com 中的 [PDF]

Average reward reinforcement learning: Foundations, algorithms, and empirical results

作者

Sridhar Mahadevan

发表日期

1996/1

期刊

Machine learning

卷号

期号

页码范围

159-195

出版商

Kluwer Academic Publishers-Plenum Publishers

简介

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric called n-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms: while several algorithms can provably generate gain-optimal …

引用总数

被引用次数：627

1996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202411 12 10 13 6 10 25 22 19 17 20 29 20 17 25 24 25 22 15 17 17 13 25 37 35 44 32 45 16

学术搜索中的文章

Average reward reinforcement learning: Foundations, algorithms, and empirical results

S Mahadevan - Machine learning, 1996

被引用次数：627 相关文章所有 13 个版本