… bootstrapping in off-policyevaluation (OPE), and in particular, we focus on the fitted Q-evaluation (… the on-policy case, our bootstrap-based confidenceinterval still has a clear advantage …
… For critical applications, this might be troublesome [94] and thus necessitates obtaining confidenceintervals … Statistical Bootstrapping: An important advantage of having constructed an …
… -confidence behavior-agnostic off-policyevaluation in reinforcement learning, where the goal is to estimate a confidenceinterval on … inequalities and the bootstrap applied to importance …
… on-policy (TD) and off-policy (GTD) reinforcement learning algorithms. Second, we prove that the confidenceintervals constructed using our bootstrap algorithm are asymptotically valid. …
Y Feng, Z Tang, N Zhang, Q Liu - arXiv preprint arXiv:2103.05741, 2021 - arxiv.org
… non-asymptotic confidenceintervals in infinite-horizon off-policyevaluation, which remains a … , bootstrapping is time consuming since it requires to repeat the whole off-policyevaluation …
N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
… interval. That said, these generalization bounds are typically loose for practical purposes, and we handle statistical errors by bootstrapping … practically useful confidenceintervals that are …
… We propose an off-policyevaluation method connecting off-policyevaluation to estimating validation error for a positive-unlabeled (… Shaded regions are a 95% confidenceinterval. P γt0 …
… We consider a model-based approach to perform batch off-policyevaluation in reinforcement learning. Our method takes a mixture-… Bootstrapping with models: Confidenceintervals for …
… significantly from having a confidenceinterval (CI) that quantifies … This motivates us to study the off-policyevaluation (OPE) … This is because the standard bootstrap method is not valid …