Reincarnating reinforcement learning: Reusing prior computation to accelerate progress

M Schwarzer, JSO Ceron, A Courville… - International …, 2023 - proceedings.mlr.press

We introduce a value-based RL agent, which we call BBF, that achieves super-human
performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used …

被引用次数：45 相关文章所有 7 个版本

[PDF] mlr.press

Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling

K Nottingham, P Ammanabrolu, A Suhr… - International …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of
the world. However, if initialized with knowledge of high-level subgoals and transitions …

被引用次数：45 相关文章所有 11 个版本

[PDF] neurips.cc

Deep reinforcement learning with plasticity injection

E Nikishin, J Oh, G Ostrovski, C Lyle… - Advances in …, 2024 - proceedings.neurips.cc

A growing body of evidence suggests that neural networks employed in deep reinforcement
learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the …

被引用次数：20 相关文章所有 5 个版本

[HTML] arxiv.org

Gkd: Generalized knowledge distillation for auto-regressive sequence models

R Agarwal, N Vieillard, P Stanczyk, S Ramos… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation is commonly used for compressing neural networks to reduce their
inference cost and memory footprint. However, current distillation methods for auto …

被引用次数：32 相关文章所有 2 个版本

[PDF] neurips.cc

Accelerating exploration with unlabeled prior data

Q Li, J Zhang, D Ghosh, A Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Learning to solve tasks from a sparse reward signal is a major challenge for standard
reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to …

被引用次数：4 相关文章所有 4 个版本

[PDF] mlr.press

Tgrl: An algorithm for teacher guided reinforcement learning

I Shenfeld, ZW Hong, A Tamar… - … on Machine Learning, 2023 - proceedings.mlr.press

We consider solving sequential decision-making problems in the scenario where the agent
has access to two supervision sources: $\textit {reward signal} $ and a $\textit {teacher} …

被引用次数：5 相关文章所有 7 个版本

[PDF] arxiv.org

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …

被引用次数：12 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach

S Mak, L Xu, T Pearce, M Ostroumov… - … Research Part C …, 2023 - Elsevier

Collaborative vehicle routing occurs when carriers collaborate through sharing their
transportation requests and performing transportation requests on behalf of each other. This …

被引用次数：2 相关文章所有 5 个版本

[PDF] thecvf.com

Policy adaptation from foundation model feedback

Y Ge, A Macaluso, LE Li, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent progress on vision-language foundation models have brought significant
advancement to building general-purpose robots. By using the pre-trained models to …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

S Huang, Q Gallouédec, F Felten, A Raffin… - arXiv preprint arXiv …, 2024 - arxiv.org

In many Reinforcement Learning (RL) papers, learning curves are useful indicators to
measure the effectiveness of RL algorithms. However, the complete raw data of the learning …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群