Y Park,
N Yoshida - arXiv preprint arXiv:2304.04170, 2023 - arxiv.org
In bandit algorithms, the randomly time-varying adaptive experimental design makes it
difficult to apply traditional limit theorems to off-policy evaluation of the treatment effect …