Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org
… We will cover a variety of offline reinforcement learning methods studied in the literature.
For each one, we will discuss the conceptual challenges, and initial steps taken to mitigate …

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - … Networks and Learning …, 2023 - ieeexplore.ieee.org
Offline RL is a paradigm that learns exclusively from static … Effective offline RL algorithms
have a much wider range of … a unifying taxonomy to classify offline RL methods. Furthermore, …

An optimistic perspective on offline reinforcement learning

R Agarwal, D Schuurmans… - … on machine learning, 2020 - proceedings.mlr.press
reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important
consideration in real world applications. This paper studies offline … in the offline setting, we …

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc
… outperforms existing offline RL methods, often learning policies that attain 2-5 times higher
final return, especially when learning from complex and multi-modal data distributions. …

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc
… in offline RL. In this paper, we ask: can we make a deep RL algorithm work offline with
minimal changes? We find that we can match the performance of state-of-the-art offline RL …

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc
offline RL, which allows for data driven policy learning using pre-collected datasets. The ability
to train policies offline can … online learning. Since the dataset has already been collected, …

Online and offline reinforcement learning by planning with a learned model

J Schrittwieser, T Hubert, A Mandhane… - Advances in …, 2021 - proceedings.neurips.cc
… for data efficient learning and offline RL, leading to MuZero Unplugged. We demonstrate
its effectiveness for the online case through results on Atari and for the offline case through …

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc
… -scale language modeling instead of those normally associated with control, we find that this
approach is effective in imitation learning, goal-reaching, and offline reinforcement learning. …

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org
… In this section, we discuss how our approach is related to prior work on offline
reinforcement learning. In particular, we discuss connections to BCQ Fujimoto et al. (2019). …

Behavior regularized offline reinforcement learning

Y Wu, G Tucker, O Nachum - arXiv preprint arXiv:1911.11361, 2019 - arxiv.org
… fails to learn a good policy (eg, in Walker2d) in the offline setting. Using k = 4 has a small
advantage compared to k = 2 except in Hopper. Both k = 2 and k = 4 significantly improve …