Showing your offline reinforcement learning work: Online evaluation budget matters

D Tarasov, A Nikulin, D Akimov… - Advances in …, 2024 - proceedings.neurips.cc

CORL is an open-source library that provides thoroughly benchmarked single-file
implementations of both deep offline and offline-to-online reinforcement learning algorithms …

被引用次数：83 相关文章所有 6 个版本

[PDF] openreview.net

Should i run offline reinforcement learning or behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - International Conference on …, 2021 - openreview.net

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …

被引用次数：79 相关文章所有 2 个版本

[PDF] neurips.cc

Supported policy optimization for offline reinforcement learning

J Wu, H Wu, Z Qiu, J Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Policy constraint methods to offline reinforcement learning (RL) typically utilize
parameterization or regularization that constrains the policy to perform actions within the …

被引用次数：72 相关文章所有 9 个版本

[PDF] arxiv.org

When should we prefer offline reinforcement learning over behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - arXiv preprint arXiv:2204.05618, 2022 - arxiv.org

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …

被引用次数：70 相关文章所有 2 个版本

[PDF] mlr.press

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press

Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

被引用次数：27 相关文章所有 6 个版本

[PDF] neurips.cc

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

被引用次数：26 相关文章所有 6 个版本

[PDF] aaai.org

Offline imitation learning with suboptimal demonstrations via relaxed distribution matching

L Yu, T Yu, J Song, W Neiswanger… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Offline imitation learning (IL) promises the ability to learn performant policies from pre-
collected demonstrations without interactions with the environment. However, imitating …

被引用次数：15 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of demonstration learning

A Correia, LA Alexandre - Robotics and Autonomous Systems, 2024 - Elsevier

With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Q-ensemble for offline rl: Don't scale the ensemble, scale the batch size

A Nikulin, V Kurenkov, D Tarasov, D Akimov… - arXiv preprint arXiv …, 2022 - arxiv.org

Training large neural networks is known to be time-consuming, with the learning duration
taking days or even weeks. To address this problem, large-batch optimization was …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

User-interactive offline reinforcement learning

P Swazinna, S Udluft, T Runkler - arXiv preprint arXiv:2205.10629, 2022 - arxiv.org

Offline reinforcement learning algorithms still lack trust in practice due to the risk that the
learned policy performs worse than the original policy that generated the dataset or behaves …

被引用次数：14 相关文章所有 4 个版本

高级搜索

QQ 群