C Shi, J Zhu, Y Shen, S Luo, H Zhu, R Song - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
This paper is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …