We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on …
J Hanna, P Stone, S Niekum - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org
In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this …
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit …
P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press
In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The …
M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press
We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our …
T Xie, Y Ma, YX Wang - Advances in neural information …, 2019 - proceedings.neurips.cc
Motivated by the many real-world applications of reinforcement learning (RL) that require safe-policy iterations, we consider the problem of off-policy evaluation (OPE)---the problem …
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …
Y Duan, Z Jia, M Wang - International Conference on …, 2020 - proceedings.mlr.press
This paper studies the statistical theory of off-policy evaluation with function approximation in batch data reinforcement learning problem. We consider a regression-based fitted Q …
M Yin, YX Wang - International Conference on Artificial …, 2020 - proceedings.mlr.press
We consider the problem of off-policy evaluation for reinforcement learning, where the goal is to estimate the expected reward of a target policy $\pi $ using offline data collected by …