Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Rambo-rl: Robust adversarial model-based offline reinforcement learning

M Rigter, B Lacerda, N Hawes - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …

Policy learning with observational data

S Athey, S Wager - Econometrica, 2021 - Wiley Online Library
In many areas, practitioners seek to use observational data to learn a treatment assignment
policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

Offline reinforcement learning: Fundamental barriers for value function approximation

DJ Foster, A Krishnamurthy, D Simchi-Levi… - arXiv preprint arXiv …, 2021 - arxiv.org
We consider the offline reinforcement learning problem, where the aim is to learn a decision
making policy from logged data. Offline RL--particularly when coupled with (value) function …

Optimal treatment regimes: a review and empirical comparison

Z Li, J Chen, E Laber, F Liu… - International Statistical …, 2023 - Wiley Online Library
A treatment regime is a sequence of decision rules, one per decision point, that maps
accumulated patient information to a recommended intervention. An optimal treatment …

Off-policy policy evaluation for sequential decisions under unobserved confounding

H Namkoong, R Keramati… - Advances in Neural …, 2020 - proceedings.neurips.cc
When observed decisions depend only on observed features, off-policy policy evaluation
(OPE) methods for sequential decision problems can estimate the performance of evaluation …

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

Active offline policy selection

K Konyushova, Y Chen, T Paine… - Advances in …, 2021 - proceedings.neurips.cc
This paper addresses the problem of policy selection in domains with abundant logged data,
but with a restricted interaction budget. Solving this problem would enable safe evaluation …