C Shi, R Wan, V Chernozhukov, R Song - arXiv preprint arXiv:2105.04646, 2021 - arxiv.org
Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …