[PDF][PDF] Empirical analysis of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, Y Yue - Real-world Sequential Decision …, 2019 - realworld-sdm.github.io
Off-policy policy evaluation (OPE) is the task of predicting the online performance of a policy
using only pre-collected historical data (collected from an existing deployed policy or set of …

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

H Kiyohara, R Kishimoto, K Kawakami… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces SCOPE-RL, a comprehensive open-source Python software designed
for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection …

Counterfactual-augmented importance sampling for semi-offline policy evaluation

S Tang, J Wiens - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

Consistent on-line off-policy evaluation

A Hallak, S Mannor - International Conference on Machine …, 2017 - proceedings.mlr.press
The problem of on-line off-policy evaluation (OPE) has been actively studied in the last
decade due to its importance both as a stand-alone problem and as a module in a policy …

Policy-adaptive estimator selection for off-policy evaluation

T Udagawa, H Kiyohara, Y Narita, Y Saito… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …

Benchmarks for deep off-policy evaluation

J Fu, M Norouzi, O Nachum, G Tucker, Z Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline
datasets for both evaluating and selecting complex policies for decision making. The ability …

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Reliable off-policy evaluation for reinforcement learning

J Wang, R Gao, H Zha - Operations Research, 2024 - pubsonline.informs.org
In a sequential decision-making problem, off-policy evaluation estimates the expected
cumulative reward of a target policy using logged trajectory data generated from a different …