相关文章- 学术资源搜索

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arXiv preprint arXiv:1911.06854, 2019 - arxiv.org

We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

被引用次数：140 相关文章所有 11 个版本

[PDF] arxiv.org

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Benchmarks for deep off-policy evaluation

J Fu, M Norouzi, O Nachum, G Tucker, Z Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline
datasets for both evaluating and selecting complex policies for decision making. The ability …

被引用次数：79 相关文章所有 5 个版本

[PDF] jmlr.org

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

被引用次数：188 相关文章所有 7 个版本

[PDF] neurips.cc

Off-policy evaluation via off-policy classification

A Irpan, K Rao, K Bousmalis, C Harris… - Advances in …, 2019 - proceedings.neurips.cc

In this work, we consider the problem of model selection for deep reinforcement learning
(RL) in real-world environments. Typically, the performance of deep RL algorithms is …

被引用次数：54 相关文章所有 9 个版本

[PDF] mlr.press

Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning

K Lee, M Laskin, A Srinivas… - … Conference on Machine …, 2021 - proceedings.mlr.press

Off-policy deep reinforcement learning (RL) has been successful in a range of challenging
domains. However, standard off-policy RL algorithms can suffer from several issues, such as …

被引用次数：237 相关文章所有 6 个版本

[PDF] mlr.press

Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions

O Gottesman, J Futoma, Y Liu… - International …, 2020 - proceedings.mlr.press

Off-policy evaluation in reinforcement learning offers the chance of using observational data
to improve future outcomes in domains such as healthcare and education, but safe …

被引用次数：56 相关文章所有 10 个版本

[PDF] mlr.press

P3o: Policy-on policy-off policy optimization

R Fakoor, P Chaudhari… - Uncertainty in artificial …, 2020 - proceedings.mlr.press

On-policy reinforcement learning (RL) algorithms have high sample complexity while off-
policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient …

被引用次数：57 相关文章所有 7 个版本

[PDF] arxiv.org

Uncertainty weighted actor-critic for offline reinforcement learning

Y Wu, S Zhai, N Srivastava, J Susskind, J Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org

Offline Reinforcement Learning promises to learn effective policies from previously-
collected, static datasets without the need for exploration. However, existing Q-learning and …

被引用次数：169 相关文章所有 4 个版本

[PDF] arxiv.org

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

XB Peng, A Kumar, G Zhang, S Levine - arXiv preprint arXiv:1910.00177, 2019 - arxiv.org

In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that
uses standard supervised learning methods as subroutines. Our goal is an algorithm that …

被引用次数：469 相关文章所有 6 个版本

高级搜索

QQ 群

Empirical study of off-policy policy evaluation for reinforcement learning

A review of off-policy evaluation in reinforcement learning

Benchmarks for deep off-policy evaluation

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

Off-policy evaluation via off-policy classification

Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning

Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions

P3o: Policy-on policy-off policy optimization

Uncertainty weighted actor-critic for offline reinforcement learning

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

相关搜索

引用