N Kallus, M Uehara - Advances in neural information processing systems, 2019 - par.nsf.gov
Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows
one to evaluate novel decision policies without needing to conduct exploration, which is …