查看文章

github.io 中的 [PDF]

Empirical analysis of off-policy policy evaluation for reinforcement learning

作者

Cameron Voloshin, Hoang M Le, Yisong Yue

发表日期

2019

期刊

Real-world Sequential Decision Making Workshop at ICML

卷号

2019

简介

Off-policy policy evaluation (OPE) is the task of predicting the online performance of a policy using only pre-collected historical data (collected from an existing deployed policy or set of policies). For many real-world applications, accurate OPE is crucial since deploying bad policies can be prohibitively costly or dangerous. With the increasing interest in deploying learning-based methods for safety-critical applications, the study of OPE has also become correspondingly more important. In this paper, we present the first comprehensive empirical analysis of most of the recently proposed OPE methods. Based on thousands of experiments and detailed empirical analyses, we offer a summarized set of guidelines for effectively using OPE in practice, as well as suggest directions for future research to address current limitations.

引用总数

被引用次数：5

20202021202220231 1 1 2

学术搜索中的文章

Empirical analysis of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, Y Yue - Real-world Sequential Decision Making Workshop at …, 2019

被引用次数：5 相关文章