作者
Sridhar Mahadevan, Jonathan Connell
发表日期
1992/6/1
期刊
Artificial intelligence
卷号
55
期号
2-3
页码范围
311-365
出版商
Elsevier
简介
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions.
  • 1.
    (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a …
引用总数
1991199219931994199519961997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024914244638585160524030593336423031272915141719181615121216181716114