Open-television: Teleoperation with immersive active visual feedback

X Cheng, J Li, S Yang, G Yang, X Wang - arXiv preprint arXiv:2407.01512, 2024 - arxiv.org
Teleoperation serves as a powerful method for collecting on-robot data essential for robot
learning from demonstrations. The intuitiveness and ease of use of the teleoperation system …

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

J Wen, Y Zhu, J Li, M Zhu, K Wu, Z Xu, N Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor
control and instruction comprehension through end-to-end learning processes. However …

Policy adaptation via language optimization: Decomposing tasks for few-shot imitation

V Myers, BC Zheng, O Mees, S Levine… - arXiv preprint arXiv …, 2024 - arxiv.org
Learned language-conditioned robot policies often struggle to effectively adapt to new real-
world tasks even when pre-trained across a diverse set of instructions. We propose a novel …

A survey on enhancing reinforcement learning in complex environments: Insights from human and llm feedback

AR Laleh, MN Ahmadabadi - arXiv preprint arXiv:2411.13410, 2024 - arxiv.org
Reinforcement learning (RL) is one of the active fields in machine learning, demonstrating
remarkable potential in tackling real-world challenges. Despite its promising prospects, this …

View: Visual imitation learning with waypoints

A Jonnavittula, S Parekh, DP Losey - arXiv preprint arXiv:2404.17906, 2024 - arxiv.org
Robots can use Visual Imitation Learning (VIL) to learn everyday tasks from video
demonstrations. However, translating visual observations into actionable robot policies is …

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

RK Jenamani, P Sundaresan, M Sakr… - arXiv preprint arXiv …, 2024 - arxiv.org
Robot-assisted feeding has the potential to improve the quality of life for individuals with
mobility limitations who are unable to feed themselves independently. However, there exists …

Words2contact: Identifying support contacts from verbal instructions using foundation models

D Totsila, Q Rouxel, JB Mouret… - 2024 IEEE-RAS 23rd …, 2024 - ieeexplore.ieee.org
This paper presents Words2Contact, a language-guided multi-contact placement pipeline
leveraging large language models and vision language models. Our method is a key …

The Ingredients for Robotic Diffusion Transformers

S Dasari, O Mees, S Zhao, MK Srirama… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years roboticists have achieved remarkable progress in solving increasingly
general tasks on dexterous robotic hardware by leveraging high capacity Transformer …

Autonomous interactive correction MLLM for robust robotic manipulation

C Xiong, C Shen, X Li, K Zhou, J Liu… - … Annual Conference on …, 2024 - openreview.net
The ability to reflect on and correct failures is crucial for robotic systems to interact stably with
real-life objects. Observing the generalization and reasoning capabilities of Multimodal …

InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

A Lee, I Chuang, LY Chen, I Soltani - arXiv preprint arXiv:2409.07914, 2024 - arxiv.org
Bimanual manipulation presents unique challenges compared to unimanual tasks due to the
complexity of coordinating two robotic arms. In this paper, we introduce InterACT: Inter …