Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press
Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

Multi-step off-policy learning without importance sampling ratios

AR Mahmood, H Yu, RS Sutton - arXiv preprint arXiv:1702.03006, 2017 - arxiv.org
To estimate the value functions of policies from exploratory data, most model-free off-policy
algorithms rely on importance sampling, where the use of importance sampling ratios often …

Accountable off-policy evaluation with kernel bellman statistics

Y Feng, T Ren, Z Tang, Q Liu - International Conference on …, 2020 - proceedings.mlr.press
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy
from observed data collected from previous experiments, without requiring the execution of …

Adaptive estimator selection for off-policy evaluation

Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press
We develop a generic data-driven method for estimator selection in off-policy policy
evaluation settings. We establish a strong performance guarantee for the method, showing …

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

Off-policy confidence interval estimation with confounded markov decision process

C Shi, J Zhu, Y Shen, S Luo, H Zhu… - Journal of the American …, 2024 - Taylor & Francis
This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Y Liu, PL Bacon, E Brunskill - International Conference on …, 2020 - proceedings.mlr.press
Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019 - arxiv.org
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …