Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press
Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning

N Kallus, M Uehara - Operations Research, 2022 - pubsonline.informs.org
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …

[引用][C] Near optimal provable uniform convergence in off-policy evaluation for reinforcement learning

M Yin, Y Bai, YX Wang - arXiv preprint arXiv:2007.03760, 2020

Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning

N Kallus, M Uehara - Advances in neural information …, 2019 - proceedings.neurips.cc
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …

[引用][C] Efficiently breaking the curse of horizon: Double reinforcement learning in infinite-horizon processes

N Kallus, M Uehara - arXiv preprint arXiv:1909.05850, 2019

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org
It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

State-action similarity-based representations for off-policy evaluation

B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation

Y Saito, S Aihara, M Matsutani, Y Narita - arXiv preprint arXiv:2008.07146, 2020 - arxiv.org
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using
data generated by a different policy. Because of its huge potential impact in practice, there …