Optimistic initialization and greediness lead to polynomial time learning in factored MDPs- 学术资源搜索

文章

学术资源搜索

Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

I Szita, A Lőrincz - Proceedings of the 26th annual international …, 2009 - dl.acm.org

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored
Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm,
maintains an empirical model of the FMDP in a conventional way, and always follows a
greedy policy with respect to its model. The only trick of the algorithm is that the model is
initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the
fixed point of approximate value iteration (AVI);(ii) the number of steps when the agent …

被引用次数：33 相关文章所有 10 个版本

[PDF] arxiv.org

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs-Extended Version

I Szita, A Lorincz - arXiv preprint arXiv:0904.3352, 2009 - arxiv.org

被引用次数：1 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs-Extended Version

引用