A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Reinforcement learning in low-rank mdps with density features

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press
MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

Minimax Instrumental Variable Regression and Convergence Guarantees without Identification or Closedness

A Bennett, N Kallus, X Mao, W Newey… - The Thirty Sixth …, 2023 - proceedings.mlr.press
In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.
Recently, many flexible machine learning methods have been developed for instrumental …

Offline minimax soft-q-learning under realizability and partial coverage

M Uehara, N Kallus, JD Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019 - arxiv.org
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …

The optimal approximation factors in misspecified off-policy value function estimation

P Amortila, N Jiang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative
blow-up factors with respect to the misspecification error of function approximation. Yet, the …

Orthogonalized Estimation of Difference of -functions

D Cao, A Zhou - arXiv preprint arXiv:2406.08697, 2024 - arxiv.org
Offline reinforcement learning is important in many settings with available observational data
but the inability to deploy new policies online due to safety, cost, and other concerns. Many …

A Theory of Learnability for Offline Decision Making

C Mao, Q Zhang - arXiv preprint arXiv:2406.01378, 2024 - arxiv.org
We study the problem of offline decision making, which focuses on learning decisions from
datasets only partially correlated with the learning objective. While previous research has …

Offline Reinforcement Learning with Additional Covering Distributions

C Mao - arXiv preprint arXiv:2305.12679, 2023 - arxiv.org
We study learning optimal policies from a logged dataset, ie, offline RL, with function
approximation. Despite the efforts devoted, existing algorithms with theoretic finite-sample …

Reinforcement learning under general function approximation and novel interaction settings

J Chen - 2023 - ideals.illinois.edu
Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …