Value-aware loss function for model-based reinforcement learning

A Farahmand, A Barreto… - Artificial Intelligence and …, 2017 - proceedings.mlr.press
Artificial Intelligence and Statistics, 2017proceedings.mlr.press
We consider the problem of estimating the transition probability kernel to be used by a
model-based reinforcement learning (RL) algorithm. We argue that estimating a generative
model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
not take into account the underlying structure of decision problem and the RL algorithm that
intends to solve it. We introduce a loss function that takes the structure of the value function
into account. We provide a finite-sample upper bound for the loss function showing the …
Abstract
We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.
proceedings.mlr.press
以上显示的是最相近的搜索结果。 查看全部搜索结果