We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only …
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on …
To estimate the value functions of policies from exploratory data, most model-free off-policy algorithms rely on importance sampling, where the use of importance sampling ratios often …
Y Feng, T Ren, Z Tang, Q Liu - International Conference on …, 2020 - proceedings.mlr.press
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments, without requiring the execution of …
Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press
We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing …
Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre- collected observational data generated by a potentially different behavior policy. In many …
C Shi, J Zhu, Y Shen, S Luo, H Zhu… - Journal of the American …, 2024 - Taylor & Francis
This article is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the …
Off-policy policy estimators that use importance sampling (IS) can suffer from high variance in long-horizon domains, and there has been particular excitement over new IS methods that …
N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019 - arxiv.org
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and infinite-horizon settings due to diminishing overlap between behavior and target policies. In …