N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible …
A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press
Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings where experimentation is limited, such as healthcare. But, in these very same settings …
N Kallus, M Uehara - Operations Research, 2022 - pubsonline.informs.org
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and infinite-horizon settings due to diminishing overlap between behavior and target policies. In …
N Kallus, M Uehara - Advances in neural information …, 2019 - proceedings.neurips.cc
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is …
It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment …
B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running …
Y Saito, S Aihara, M Matsutani, Y Narita - arXiv preprint arXiv:2008.07146, 2020 - arxiv.org
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there …