Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm,
maintains an empirical model of the FMDP in a conventional way, and always follows a
greedy policy with respect to its model. The only trick of the algorithm is that the model is
initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the
fixed point of approximate value iteration (AVI);(ii) the number of steps when the agent …