Bigger, better, faster: Human-level atari with human-level efficiency

M Schwarzer, JSO Ceron, A Courville… - International …, 2023 - proceedings.mlr.press
We introduce a value-based RL agent, which we call BBF, that achieves super-human
performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used …

Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling

K Nottingham, P Ammanabrolu, A Suhr… - International …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of
the world. However, if initialized with knowledge of high-level subgoals and transitions …

Deep reinforcement learning with plasticity injection

E Nikishin, J Oh, G Ostrovski, C Lyle… - Advances in …, 2024 - proceedings.neurips.cc
A growing body of evidence suggests that neural networks employed in deep reinforcement
learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the …

Gkd: Generalized knowledge distillation for auto-regressive sequence models

R Agarwal, N Vieillard, P Stanczyk, S Ramos… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge distillation is commonly used for compressing neural networks to reduce their
inference cost and memory footprint. However, current distillation methods for auto …

Accelerating exploration with unlabeled prior data

Q Li, J Zhang, D Ghosh, A Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Learning to solve tasks from a sparse reward signal is a major challenge for standard
reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to …

Tgrl: An algorithm for teacher guided reinforcement learning

I Shenfeld, ZW Hong, A Tamar… - … on Machine Learning, 2023 - proceedings.mlr.press
We consider solving sequential decision-making problems in the scenario where the agent
has access to two supervision sources: $\textit {reward signal} $ and a $\textit {teacher} …

Proto: Iterative policy regularized offline-to-online reinforcement learning

J Li, X Hu, H Xu, J Liu, X Zhan, YQ Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining
and online finetuning, promises enhanced sample efficiency and policy performance …

[HTML][HTML] Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach

S Mak, L Xu, T Pearce, M Ostroumov… - … Research Part C …, 2023 - Elsevier
Collaborative vehicle routing occurs when carriers collaborate through sharing their
transportation requests and performing transportation requests on behalf of each other. This …

Policy adaptation from foundation model feedback

Y Ge, A Macaluso, LE Li, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent progress on vision-language foundation models have brought significant
advancement to building general-purpose robots. By using the pre-trained models to …

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

S Huang, Q Gallouédec, F Felten, A Raffin… - arXiv preprint arXiv …, 2024 - arxiv.org
In many Reinforcement Learning (RL) papers, learning curves are useful indicators to
measure the effectiveness of RL algorithms. However, the complete raw data of the learning …