Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Benchmarks for deep off-policy evaluation

J Fu, M Norouzi, O Nachum, G Tucker, Z Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline
datasets for both evaluating and selecting complex policies for decision making. The ability …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

Off-policy evaluation via off-policy classification

A Irpan, K Rao, K Bousmalis, C Harris… - Advances in …, 2019 - proceedings.neurips.cc
In this work, we consider the problem of model selection for deep reinforcement learning
(RL) in real-world environments. Typically, the performance of deep RL algorithms is …

Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning

K Lee, M Laskin, A Srinivas… - … Conference on Machine …, 2021 - proceedings.mlr.press
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …

Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions

O Gottesman, J Futoma, Y Liu… - International …, 2020 - proceedings.mlr.press
Off-policy evaluation in reinforcement learning offers the chance of using observational data
to improve future outcomes in domains such as healthcare and education, but safe …

P3o: Policy-on policy-off policy optimization

R Fakoor, P Chaudhari… - Uncertainty in artificial …, 2020 - proceedings.mlr.press
On-policy reinforcement learning (RL) algorithms have high sample complexity while off-
policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient …

Uncertainty weighted actor-critic for offline reinforcement learning

Y Wu, S Zhai, N Srivastava, J Susskind, J Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Offline Reinforcement Learning promises to learn effective policies from previously-
collected, static datasets without the need for exploration. However, existing Q-learning and …

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

XB Peng, A Kumar, G Zhang, S Levine - arXiv preprint arXiv:1910.00177, 2019 - arxiv.org
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that
uses standard supervised learning methods as subroutines. Our goal is an algorithm that …