相关文章- 学术资源搜索

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

被引用次数：85 相关文章所有 13 个版本

[PDF] neurips.cc

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

被引用次数：37 相关文章所有 11 个版本

[PDF] aaai.org

Bootstrapping with models: Confidence intervals for off-policy evaluation

J Hanna, P Stone, S Niekum - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org

In many reinforcement learning applications, it is desirable to determine confidence interval
lower bounds on the performance of any given policy without executing said policy. In this …

被引用次数：84 相关文章所有 18 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

被引用次数：38 相关文章所有 5 个版本

[PDF] mlr.press

Data-efficient off-policy policy evaluation for reinforcement learning

P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press

In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …

被引用次数：729 相关文章所有 14 个版本

[PDF] mlr.press

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press

We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

被引用次数：183 相关文章所有 6 个版本

[PDF] neurips.cc

Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling

T Xie, Y Ma, YX Wang - Advances in neural information …, 2019 - proceedings.neurips.cc

Motivated by the many real-world applications of reinforcement learning (RL) that require
safe-policy iterations, we consider the problem of off-policy evaluation (OPE)---the problem …

被引用次数：179 相关文章所有 10 个版本

[PDF] mlr.press

Bootstrapping fitted q-evaluation for off-policy inference

B Hao, X Ji, Y Duan, H Lu… - International …, 2021 - proceedings.mlr.press

Bootstrapping provides a flexible and effective approach for assessing the quality of batch
reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …

被引用次数：39 相关文章所有 6 个版本

[PDF] mlr.press

Minimax-optimal off-policy evaluation with linear function approximation

Y Duan, Z Jia, M Wang - International Conference on …, 2020 - proceedings.mlr.press

This paper studies the statistical theory of off-policy evaluation with function approximation in
batch data reinforcement learning problem. We consider a regression-based fitted Q …

被引用次数：163 相关文章所有 6 个版本

[PDF] mlr.press

Asymptotically efficient off-policy evaluation for tabular reinforcement learning

M Yin, YX Wang - International Conference on Artificial …, 2020 - proceedings.mlr.press

We consider the problem of off-policy evaluation for reinforcement learning, where the goal
is to estimate the expected reward of a target policy $\pi $ using offline data collected by …

被引用次数：79 相关文章所有 5 个版本

高级搜索

QQ 群

Coindice: Off-policy confidence interval estimation

Variance-aware off-policy evaluation with linear function approximation

Bootstrapping with models: Confidence intervals for off-policy evaluation

Deeply-debiased off-policy interval estimation

Data-efficient off-policy policy evaluation for reinforcement learning

Minimax weight and q-function learning for off-policy evaluation

Towards optimal off-policy evaluation for reinforcement learning with marginalized importance sampling

Bootstrapping fitted q-evaluation for off-policy inference

Minimax-optimal off-policy evaluation with linear function approximation

Asymptotically efficient off-policy evaluation for tabular reinforcement learning

相关搜索

引用