Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review

R Liu, F Nageotte, P Zanne, M de Mathelin… - Robotics, 2021 - mdpi.com
Deep learning has provided new ways of manipulating, processing and analyzing data. It
sometimes may achieve results comparable to, or surpassing human expert performance …

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

W Chung, V Thomas, MC Machado… - … on Machine Learning, 2021 - proceedings.mlr.press
Bandit and reinforcement learning (RL) problems can often be framed as optimization
problems where the goal is to maximize average performance while having access only to …

[PDF][PDF] Robust on-policy data collection for data-efficient policy evaluation

R Zhong, JP Hanna, L Schäfer… - … Learning Workshop at …, 2021 - offline-rl-neurips.github.io
This paper considers how to complement offline reinforcement learning (RL) data with
additional data collection for the task of policy evaluation. In policy evaluation, the task is to …

Lessons on off-policy methods from a notification component of a chatbot

S Rome, T Chen, M Kreisel, D Zhou - Machine Learning, 2021 - Springer
This work serves as a review of our experience applying off-policy techniques to train and
evaluate a contextual bandit model powering a troubleshooting notification in a chatbot …

Independence-aware advantage estimation

P Zhang, L Zhao, G Liu, J Bian, M Huang, T Qin, TY Liu - 2021 - openreview.net
Most of existing advantage function estimation methods in reinforcement learning suffer from
the problem of high variance, which scales unfavorably with the time horizon. To address …

[PDF][PDF] Subgaussian importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - ICML-21 Workshop on …, 2021 - lyang36.github.io
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

Policy Optimization via Optimal Policy Evaluation

AM Metelli, S Meta, M Restelli - Deep RL Workshop NeurIPS 2021 - openreview.net
Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed as a what-if …

Curriculum learning in reinforcement learning

SS Narvekar - 2021 - repositories.lib.utexas.edu
In recent years, reinforcement learning (RL) has been increasingly successful at solving
complex tasks. Despite these successes, one of the fundamental challenges is that many RL …