Bootstrapping with models: Confidence intervals for off-policy evaluation- 学术资源搜索

文章

学术资源搜索

Bootstrapping with models: Confidence intervals for off-policy evaluation

J Hanna, P Stone, S Niekum - Proceedings of the AAAI Conference on …, 2017 - ojs.aaai.org

Proceedings of the AAAI Conference on Artificial Intelligence, 2017•ojs.aaai.org

In many reinforcement learning applications, it is desirable to determine confidence interval
lower bounds on the performance of any given policy without executing said policy. In this
context, we propose two bootstrapping off-policy evaluation methods which use learned
MDP transition models in order to estimate lower confidence bounds on policy performance
with limited data. We empirically evaluate the proposed methods in a standard policy
evaluation tasks.

Abstract

In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.

ojs.aaai.org

展开收起

被引用次数：84 相关文章所有 18 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Bootstrapping with models: Confidence intervals for off-policy evaluation

引用