Bootstrapping with models: Confidence intervals for off-policy evaluation- 学术资源搜索

Statistical bootstrapping for uncertainty estimation in off-policy evaluation

I Kostrikov, O Nachum - arXiv preprint arXiv:2007.13609, 2020 - arxiv.org

… of Efron’s bootstrap for computing confidence intervals with respect to the direct method (DM)
for off-policy evaluation. Our theoretical results show that Efron’s bootstrap is valid given …

被引用次数：29 相关文章所有 2 个版本

[PDF] mlr.press

Bootstrapping fitted q-evaluation for off-policy inference

B Hao, X Ji, Y Duan, H Lu… - International …, 2021 - proceedings.mlr.press

… bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation
(… the on-policy case, our bootstrap-based confidence interval still has a clear advantage …

被引用次数：39 相关文章所有 6 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

… For critical applications, this might be troublesome [94] and thus necessitates obtaining
confidence intervals … Statistical Bootstrapping: An important advantage of having constructed an …

被引用次数：51 相关文章所有 11 个版本

[PDF] neurips.cc

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc

… -confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal
is to estimate a confidence interval on … inequalities and the bootstrap applied to importance …

被引用次数：85 相关文章所有 13 个版本

[PDF] arxiv.org

Online bootstrap inference for policy evaluation in reinforcement learning

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis

… on-policy (TD) and off-policy (GTD) reinforcement learning algorithms. Second, we prove
that the confidence intervals constructed using our bootstrap algorithm are asymptotically valid. …

被引用次数：33 相关文章所有 9 个版本

[PDF] arxiv.org

Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds

Y Feng, Z Tang, N Zhang, Q Liu - arXiv preprint arXiv:2103.05741, 2021 - arxiv.org

… non-asymptotic confidence intervals in infinite-horizon off-policy evaluation, which remains a
… , bootstrapping is time consuming since it requires to repeat the whole off-policy evaluation …

被引用次数：12 相关文章所有 3 个版本

[PDF] neurips.cc

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

… interval. That said, these generalization bounds are typically loose for practical purposes, and
we handle statistical errors by bootstrapping … practically useful confidence intervals that are …

被引用次数：80 相关文章所有 7 个版本

[PDF] neurips.cc

Off-policy evaluation via off-policy classification

A Irpan, K Rao, K Bousmalis, C Harris… - Advances in …, 2019 - proceedings.neurips.cc

… We propose an off-policy evaluation method connecting off-policy evaluation to estimating
validation error for a positive-unlabeled (… Shaded regions are a 95% confidence interval. P γt0 …

被引用次数：54 相关文章所有 9 个版本

[PDF] mlr.press

Combining parametric and nonparametric models for off-policy evaluation

O Gottesman, Y Liu, S Sussex… - International …, 2019 - proceedings.mlr.press

… We consider a model-based approach to perform batch off-policy evaluation in reinforcement
learning. Our method takes a mixture-… Bootstrapping with models: Confidence intervals for …

被引用次数：33 相关文章所有 13 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

… significantly from having a confidence interval (CI) that quantifies … This motivates us to study
the off-policy evaluation (OPE) … This is because the standard bootstrap method is not valid …

被引用次数：38 相关文章所有 5 个版本

高级搜索

QQ 群

Statistical bootstrapping for uncertainty estimation in off-policy evaluation

Bootstrapping fitted q-evaluation for off-policy inference

Universal off-policy evaluation

Coindice: Off-policy confidence interval estimation

Online bootstrap inference for policy evaluation in reinforcement learning

Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds

Minimax value interval for off-policy evaluation and policy optimization

Off-policy evaluation via off-policy classification

Combining parametric and nonparametric models for off-policy evaluation

Deeply-debiased off-policy interval estimation

引用