A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

A review of recent advances in empirical likelihood

P Liu, Y Zhao - Wiley Interdisciplinary Reviews: Computational …, 2023 - Wiley Online Library
Empirical likelihood is widely used in many statistical problems. In this article, we provide a
review of the empirical likelihood method, due to its significant development in recent years …

Optidice: Offline policy optimization via stationary distribution correction estimation

J Lee, W Jeon, B Lee, J Pineau… - … Conference on Machine …, 2021 - proceedings.mlr.press
We consider the offline reinforcement learning (RL) setting where the agent aims to optimize
the policy solely from the data without further environment interactions. In offline RL, the …

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning

J Guan, G Chen, J Ji, L Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …

Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters

K Ghasemipour, SS Gu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Motivated by the success of ensembles for uncertainty estimation in supervised learning, we
take a renewed look at how ensembles of $ Q $-functions can be leveraged as the primary …

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

W Yang, L Zhang, Z Zhang - The Annals of Statistics, 2022 - projecteuclid.org
Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

C Shi, M Uehara, J Huang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …