Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

I Szita, A Lőrincz - Proceedings of the 26th annual international …, 2009 - dl.acm.org
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored
Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm,
maintains an empirical model of the FMDP in a conventional way, and always follows a
greedy policy with respect to its model. The only trick of the algorithm is that the model is
initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the
fixed point of approximate value iteration (AVI);(ii) the number of steps when the agent …

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs-Extended Version

I Szita, A Lorincz - arXiv preprint arXiv:0904.3352, 2009 - arxiv.org
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored
Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm,
maintains an empirical model of the FMDP in a conventional way, and always follows a
greedy policy with respect to its model. The only trick of the algorithm is that the model is
initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the
fixed point of approximate value iteration (AVI);(ii) the number of steps when the agent …
以上显示的是最相近的搜索结果。 查看全部搜索结果