作者
Shie Mannor, Ishai Menache, Amit Hoze, Uri Klein
发表日期
2004/7/4
研讨会论文
Proceedings of the twenty-first international conference on Machine learning
页码范围
71
出版商
ACM
简介
We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to different regions. Policies for reaching the different parts of the space are separately learned and added to the model in a form of options (macro-actions). The options are used for accelerating the Q-Learning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of "interesting" regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.
引用总数
20042005200620072008200920102011201220132014201520162017201820192020202120222023202481581616111417718191318161826221515288
学术搜索中的文章
S Mannor, I Menache, A Hoze, U Klein - Proceedings of the twenty-first international conference …, 2004