LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

X Li, C Mata, J Park, K Kahatapitiya, YS Jang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) equipped with extensive world knowledge and strong
reasoning skills can tackle diverse tasks across domains, often by posing them as …

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

P Li, T Liu, Y Li, M Han, H Geng, S Wang, Y Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous robotic systems capable of learning novel manipulation tasks are poised to
transform industries from manufacturing to service automation. However, modern methods …

Towards Generalist Robot Learning from Internet Video: A Survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents an overview of methods for learning from video (LfV) in the context of
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Z Xiong, R Vuorio, J Beck, M Zimmer, K Shao… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning a universal policy across different robot morphologies can significantly improve
learning efficiency and enable zero-shot generalization to unseen morphologies. However …

[HTML][HTML] A systematic review of major evaluation metrics for simulator-based automatic assessment of driving after stroke

P Taveekitworachai, G Chanmas, P Paliyawan… - Heliyon, 2024 - cell.com
Background: Simulator-based driving assessments (SA) have recently been used and
studied for various purposes, particularly for post-stroke patients. Automating such …

What Foundation Models can Bring for Robot Learning in Manipulation: A Survey

D Li, Y Jin, H Yu, J Shi, X Hao, P Hao, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The realization of universal robots is an ultimate goal of researchers. However, a key hurdle
in achieving this goal lies in the robots' ability to manipulate objects in their unstructured …

Understanding Long Videos in One Multimodal Language Model Pass

K Ranasinghe, X Li, K Kahatapitiya… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs), known to contain a strong awareness of world knowledge,
have allowed recent approaches to achieve excellent performance on Long-Video …

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

D Guo, Y Xiang, S Zhao, X Zhu, M Tomizuka… - arXiv preprint arXiv …, 2024 - arxiv.org
Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact
with objects. Despite substantial progress, its generalizability to counter-intuitive or long …

A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings

V Cohen, JX Liu, R Mooney, S Tellex… - arXiv preprint arXiv …, 2024 - arxiv.org
With large language models, robots can understand language more flexibly and more
capable than ever before. This survey reviews recent literature and situates it into a …

Ego-Foresight: Agent Visuomotor Prediction as Regularization for RL

MS Nunes, A Dehban, Y Demiris… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the significant advancements in Deep Reinforcement Learning (RL) observed in the
last decade, the amount of training experience necessary to learn effective policies remains …