Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

Bootstrapping with models: Confidence intervals for off-policy evaluation

J Hanna, P Stone, S Niekum - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org
In many reinforcement learning applications, it is desirable to determine confidence interval
lower bounds on the performance of any given policy without executing said policy. In this …

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press
Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

Data-efficient off-policy policy evaluation for reinforcement learning

P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press
In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press
We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling

T Xie, Y Ma, YX Wang - Advances in neural information …, 2019 - proceedings.neurips.cc
Motivated by the many real-world applications of reinforcement learning (RL) that require
safe-policy iterations, we consider the problem of off-policy evaluation (OPE)---the problem …

Bootstrapping fitted q-evaluation for off-policy inference

B Hao, X Ji, Y Duan, H Lu… - International …, 2021 - proceedings.mlr.press
Bootstrapping provides a flexible and effective approach for assessing the quality of batch
reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …

Minimax-optimal off-policy evaluation with linear function approximation

Y Duan, Z Jia, M Wang - International Conference on …, 2020 - proceedings.mlr.press
This paper studies the statistical theory of off-policy evaluation with function approximation in
batch data reinforcement learning problem. We consider a regression-based fitted Q …

Asymptotically efficient off-policy evaluation for tabular reinforcement learning

M Yin, YX Wang - International Conference on Artificial …, 2020 - proceedings.mlr.press
We consider the problem of off-policy evaluation for reinforcement learning, where the goal
is to estimate the expected reward of a target policy $\pi $ using offline data collected by …