What matters in on-policy reinforcement learning? a large-scale empirical study

M Andrychowicz, A Raichuk, P Stańczyk… - arXiv preprint arXiv …, 2020 - arxiv.org
In recent years, on-policy reinforcement learning (RL) has been successfully applied to
many different continuous control tasks. While RL algorithms are often conceptually simple …

What matters for on-policy deep actor-critic methods? a large-scale study

M Andrychowicz, A Raichuk, P Stańczyk… - International …, 2021 - openreview.net
In recent years, reinforcement learning (RL) has been successfully applied to many different
continuous control tasks. While RL algorithms are often conceptually simple, their state-of …

Evolving rewards to automate reinforcement learning

A Faust, A Francis, D Mehta - arXiv preprint arXiv:1905.07628, 2019 - arxiv.org
Many continuous control tasks have easily formulated objectives, yet using them directly as
a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many …

Awac: Accelerating online reinforcement learning with offline datasets

A Nair, A Gupta, M Dalal, S Levine - arXiv preprint arXiv:2006.09359, 2020 - arxiv.org
Reinforcement learning (RL) provides an appealing formalism for learning control policies
from experience. However, the classic active formulation of RL necessitates a lengthy active …

P3o: Policy-on policy-off policy optimization

R Fakoor, P Chaudhari… - Uncertainty in artificial …, 2020 - proceedings.mlr.press
On-policy reinforcement learning (RL) algorithms have high sample complexity while off-
policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient …

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

NY Siegel, JT Springenberg, F Berkenkamp… - arXiv preprint arXiv …, 2020 - arxiv.org
Off-policy reinforcement learning algorithms promise to be applicable in settings where only
a fixed data-set (batch) of environment interactions is available and no new experience can …

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - Machine Learning, 2021 - Springer
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

XB Peng, A Kumar, G Zhang, S Levine - arXiv preprint arXiv:1910.00177, 2019 - arxiv.org
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that
uses standard supervised learning methods as subroutines. Our goal is an algorithm that …

The in-sample softmax for offline reinforcement learning

C Xiao, H Wang, Y Pan, A White, M White - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning (RL) agents can leverage batches of previously collected data to
extract a reasonable control policy. An emerging issue in this offline RL setting, however, is …

An empirical investigation of the challenges of real-world reinforcement learning

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - arXiv preprint arXiv …, 2020 - arxiv.org
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …