Revisiting the Acrobot ‘height’task: An example of efficient evolutionary policy search...- 学术资源搜索

Revisiting the Acrobot 'height'task: An example of efficient evolutionary policy search under an episodic goal seeking task

J Doucette, MI Heywood - 2011 IEEE Congress of Evolutionary …, 2011 - ieeexplore.ieee.org

2011 IEEE Congress of Evolutionary Computation (CEC), 2011•ieeexplore.ieee.org

Evolutionary methods for addressing the temporal sequence learning problem generally fall into policy search as opposed to value function optimization approaches. Various re cent results have made the claim that the policy search approach is at best inefficient at solving episodic 'goal seeking' tasks i.e., tasks under which the reward is limited to describing properties associated with a successful outcome have no qualification for degrees of failure. This work demonstrates that such a conclusion is due to a lack of diversity in the training scenarios. We therefore return to the Acrobot 'height' task domain originally used to demonstrate complete failure in evolutionary policy search. This time a very simple stochastic sampling heuristic for defining a population of training configurations is introduced. Benchmarking two recent evolutionary policy search algorithms - Neural Evolution of Augmented Topologies (NEAT) and Symbiotic Bid-Based (SBB) Genetic Programming - under this condition demonstrates solutions as effective as those returned by advanced value function methods. Moreover this is achieved while remaining within the evaluation limit imposed by the original study.

ieeexplore.ieee.org

展开收起

被引用次数：6 相关文章所有 2 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

Google学术搜索按钮

安装不用了

example.edu/paper.pdf

搜索

获取 PDF 文件

引用

References

高级搜索

QQ 群

Revisiting the Acrobot 'height'task: An example of efficient evolutionary policy search under an episodic goal seeking task

引用