CORL: Research-oriented deep offline reinforcement learning library

D Tarasov, A Nikulin, D Akimov… - Advances in …, 2024 - proceedings.neurips.cc
CORL is an open-source library that provides thoroughly benchmarked single-file
implementations of both deep offline and offline-to-online reinforcement learning algorithms …

Should i run offline reinforcement learning or behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - International Conference on …, 2021 - openreview.net
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …

Supported policy optimization for offline reinforcement learning

J Wu, H Wu, Z Qiu, J Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Policy constraint methods to offline reinforcement learning (RL) typically utilize
parameterization or regularization that constrains the policy to perform actions within the …

When should we prefer offline reinforcement learning over behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - arXiv preprint arXiv:2204.05618, 2022 - arxiv.org
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

Offline imitation learning with suboptimal demonstrations via relaxed distribution matching

L Yu, T Yu, J Song, W Neiswanger… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Offline imitation learning (IL) promises the ability to learn performant policies from pre-
collected demonstrations without interactions with the environment. However, imitating …

[HTML][HTML] A survey of demonstration learning

A Correia, LA Alexandre - Robotics and Autonomous Systems, 2024 - Elsevier
With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

Q-ensemble for offline rl: Don't scale the ensemble, scale the batch size

A Nikulin, V Kurenkov, D Tarasov, D Akimov… - arXiv preprint arXiv …, 2022 - arxiv.org
Training large neural networks is known to be time-consuming, with the learning duration
taking days or even weeks. To address this problem, large-batch optimization was …

User-interactive offline reinforcement learning

P Swazinna, S Udluft, T Runkler - arXiv preprint arXiv:2205.10629, 2022 - arxiv.org
Offline reinforcement learning algorithms still lack trust in practice due to the risk that the
learned policy performs worse than the original policy that generated the dataset or behaves …