Data-efficient policy evaluation through behavior policy search

R Liu, F Nageotte, P Zanne, M de Mathelin… - Robotics, 2021 - mdpi.com

Deep learning has provided new ways of manipulating, processing and analyzing data. It
sometimes may achieve results comparable to, or surpassing human expert performance …

被引用次数：153 相关文章所有 10 个版本

[PDF] neurips.cc

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc

Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

被引用次数：34 相关文章所有 12 个版本

[PDF] springer.com

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer

In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

被引用次数：31 相关文章所有 13 个版本

[PDF] mlr.press

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

W Chung, V Thomas, MC Machado… - … on Machine Learning, 2021 - proceedings.mlr.press

Bandit and reinforcement learning (RL) problems can often be framed as optimization
problems where the goal is to maximize average performance while having access only to …

被引用次数：29 相关文章所有 4 个版本

[PDF] github.io

[PDF][PDF] Robust on-policy data collection for data-efficient policy evaluation

R Zhong, JP Hanna, L Schäfer… - … Learning Workshop at …, 2021 - offline-rl-neurips.github.io

This paper considers how to complement offline reinforcement learning (RL) data with
additional data collection for the task of policy evaluation. In policy evaluation, the task is to …

被引用次数：3 相关文章所有 3 个版本

[PDF] springer.com

Lessons on off-policy methods from a notification component of a chatbot

S Rome, T Chen, M Kreisel, D Zhou - Machine Learning, 2021 - Springer

This work serves as a review of our experience applying off-policy techniques to train and
evaluate a contextual bandit model powering a troubleshooting notification in a chatbot …

被引用次数：2 相关文章所有 4 个版本

[PDF] openreview.net

Independence-aware advantage estimation

P Zhang, L Zhao, G Liu, J Bian, M Huang, T Qin, TY Liu - 2021 - openreview.net

Most of existing advantage function estimation methods in reinforcement learning suffer from
the problem of high variance, which scales unfavorably with the time horizon. To address …

被引用次数：4 相关文章

[PDF] github.io

[PDF][PDF] Subgaussian importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - ICML-21 Workshop on …, 2021 - lyang36.github.io

Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

被引用次数：2 相关文章

[PDF] openreview.net

Policy Optimization via Optimal Policy Evaluation

AM Metelli, S Meta, M Restelli - Deep RL Workshop NeurIPS 2021 - openreview.net

Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed as a what-if …

[PDF] utexas.edu

Curriculum learning in reinforcement learning

SS Narvekar - 2021 - repositories.lib.utexas.edu

In recent years, reinforcement learning (RL) has been increasingly successful at solving
complex tasks. Despite these successes, one of the fundamental challenges is that many RL …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群