作者
Amy McGovern, Richard S Sutton
发表日期
1998/1/1
期刊
Computer Science Department Faculty Publication Series
页码范围
15
简介
Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the effect on exploratory behavior, independent of learning, and the effect on the speed with which the learning process propagates accurate value information. We empirically measure the separate contributions of these two effects in gridworld and simulated robotic environments. In these environments, both effects were significant, but the effect of value propagation was larger. We also compare the accelerations of value propagation due to macro-actions and eligibility traces in the gridworld environment. Although eligibility traces increased the rate of convergence to the optimal value …
引用总数
1998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320243726215285631312111543532
学术搜索中的文章
A McGovern, RS Sutton - Computer Science Department Faculty Publication …, 1998