Statistical bootstrapping for uncertainty estimation in off-policy evaluation

I Kostrikov, O Nachum - arXiv preprint arXiv:2007.13609, 2020 - arxiv.org
… of Efron’s bootstrap for computing confidence intervals with respect to the direct method (DM)
for off-policy evaluation. Our theoretical results show that Efron’s bootstrap is valid given …

Bootstrapping fitted q-evaluation for off-policy inference

B Hao, X Ji, Y Duan, H Lu… - International …, 2021 - proceedings.mlr.press
bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation
(… the on-policy case, our bootstrap-based confidence interval still has a clear advantage …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
… For critical applications, this might be troublesome [94] and thus necessitates obtaining
confidence intervals … Statistical Bootstrapping: An important advantage of having constructed an …

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc
… -confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal
is to estimate a confidence interval on … inequalities and the bootstrap applied to importance …

Online bootstrap inference for policy evaluation in reinforcement learning

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis
… on-policy (TD) and off-policy (GTD) reinforcement learning algorithms. Second, we prove
that the confidence intervals constructed using our bootstrap algorithm are asymptotically valid. …

Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds

Y Feng, Z Tang, N Zhang, Q Liu - arXiv preprint arXiv:2103.05741, 2021 - arxiv.org
… non-asymptotic confidence intervals in infinite-horizon off-policy evaluation, which remains a
… , bootstrapping is time consuming since it requires to repeat the whole off-policy evaluation

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
interval. That said, these generalization bounds are typically loose for practical purposes, and
we handle statistical errors by bootstrapping … practically useful confidence intervals that are …

Off-policy evaluation via off-policy classification

A Irpan, K Rao, K Bousmalis, C Harris… - Advances in …, 2019 - proceedings.neurips.cc
… We propose an off-policy evaluation method connecting off-policy evaluation to estimating
validation error for a positive-unlabeled (… Shaded regions are a 95% confidence interval. P γt0 …

Combining parametric and nonparametric models for off-policy evaluation

O Gottesman, Y Liu, S Sussex… - International …, 2019 - proceedings.mlr.press
… We consider a model-based approach to perform batch off-policy evaluation in reinforcement
learning. Our method takes a mixture-… Bootstrapping with models: Confidence intervals for …

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press
… significantly from having a confidence interval (CI) that quantifies … This motivates us to study
the off-policy evaluation (OPE) … This is because the standard bootstrap method is not valid …