model-based reinforcement learning (RL) algorithm. We argue that estimating a generative
model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
not take into account the underlying structure of decision problem and the RL algorithm that
intends to solve it. We introduce a loss function that takes the structure of the value function
into account. We provide a finite-sample upper bound for the loss function showing the …