Awac: Accelerating online reinforcement learning with offline datasets

A Nair, A Gupta, M Dalal, S Levine - arXiv preprint arXiv:2006.09359, 2020 - arxiv.org
Reinforcement learning (RL) provides an appealing formalism for learning control policies
from experience. However, the classic active formulation of RL necessitates a lengthy active …

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

Latent-variable advantage-weighted policy optimization for offline rl

X Chen, A Ghadirzadeh, T Yu, Y Gao, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new transitions. This setting …

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

The in-sample softmax for offline reinforcement learning

C Xiao, H Wang, Y Pan, A White, M White - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning (RL) agents can leverage batches of previously collected data to
extract a reasonable control policy. An emerging issue in this offline RL setting, however, is …

D4rl: Datasets for deep data-driven reinforcement learning

J Fu, A Kumar, O Nachum, G Tucker… - arXiv preprint arXiv …, 2020 - arxiv.org
The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy
is learned from a static dataset, is compelling as progress enables RL methods to take …

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

NY Siegel, JT Springenberg, F Berkenkamp… - arXiv preprint arXiv …, 2020 - arxiv.org
Off-policy reinforcement learning algorithms promise to be applicable in settings where only
a fixed data-set (batch) of environment interactions is available and no new experience can …

Deployment-efficient reinforcement learning via model-based offline optimization

T Matsushima, H Furuta, Y Matsuo, O Nachum… - arXiv preprint arXiv …, 2020 - arxiv.org
Most reinforcement learning (RL) algorithms assume online access to the environment, in
which one may readily interleave updates to the policy with experience collection using that …

A workflow for offline model-free robotic reinforcement learning

A Kumar, A Singh, S Tian, C Finn, S Levine - arXiv preprint arXiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) enables learning control policies by utilizing only prior
experience, without any online interaction. This can allow robots to acquire generalizable …

Evolving rewards to automate reinforcement learning

A Faust, A Francis, D Mehta - arXiv preprint arXiv:1905.07628, 2019 - arxiv.org
Many continuous control tasks have easily formulated objectives, yet using them directly as
a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many …