作者
Andrew G Barto, Sridhar Mahadevan
发表日期
2003/10
来源
Discrete event dynamic systems
卷号
13
页码范围
341-379
出版商
Kluwer Academic Publishers
简介
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent …
引用总数
20032004200520062007200820092010201120122013201420152016201720182019202020212022202320241538647475795265538070615376679011510613112616278
学术搜索中的文章