Balrog: Benchmarking agentic llm and vlm reasoning on games

D Paglieri, B Cupiał, S Coward, U Piterbarg… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) and Vision Language Models (VLMs) possess extensive
knowledge and exhibit promising reasoning abilities; however, they still struggle to perform …

Flare: Achieving masterful and adaptive robot policies with large-scale reinforcement learning fine-tuning

J Hu, R Hendrix, A Farhadi, A Kembhavi… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, the Robotics field has initiated several efforts toward building generalist
robot policies through large-scale multi-task Behavior Cloning. However, direct deployments …

From imitation to refinement–residual rl for precise visual assembly

LL Ankile, A Simeonov, I Shenfeld… - … 2024 Workshop on …, 2024 - openreview.net
Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to
impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and …

Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

Gflownet pretraining with inexpensive rewards

M Pandey, G Subbaraj, E Bengio - arXiv preprint arXiv:2409.09702, 2024 - arxiv.org
Generative Flow Networks (GFlowNets), a class of generative models have recently
emerged as a suitable framework for generating diverse and high-quality molecular …

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Z Zhou, A Peng, Q Li, S Levine, A Kumar - arXiv preprint arXiv:2412.07762, 2024 - arxiv.org
The modern paradigm in machine learning involves pre-training on diverse data, followed
by task-specific fine-tuning. In reinforcement learning (RL), this translates to learning via …

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

K Yan, AG Schwing, YX Wang - arXiv preprint arXiv:2410.24108, 2024 - arxiv.org
Decision Transformers have recently emerged as a new and compelling paradigm for offline
Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While …

From Imitation to Refinement--Residual RL for Precise Assembly

L Ankile, A Simeonov, I Shenfeld, M Torne… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to
impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and …

Augmenting Unsupervised Reinforcement Learning with Self-Reference

A Zhao, E Zhu, R Lu, M Lin, YJ Liu, G Huang - arXiv preprint arXiv …, 2023 - arxiv.org
Humans possess the ability to draw on past experiences explicitly when learning new tasks
and applying them accordingly. We believe this capacity for self-referencing is especially …

MaestroMotif: Skill Design from Artificial Intelligence Feedback

M Klissarov, M Henaff, R Raileanu, S Sodhani… - arXiv preprint arXiv …, 2024 - arxiv.org
Describing skills in natural language has the potential to provide an accessible way to inject
human knowledge about decision-making into an AI system. We present MaestroMotif, a …