B Dai, O Nachum, Y Chow, L Li, C Szepesvári… - Proceedings of the 34th …, 2020 - dl.acm.org
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …