X Chen, Z Qi - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision
process with continuous states and actions. We recast the $ Q $-function estimation into a …