SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

J Zhang, C Bai, H He, W Xia, Z Wang, B Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of
scene understanding and action prediction. Current methods employ both 3D representation …

R3m: A universal visual representation for robot manipulation

S Nair, A Rajeswaran, V Kumar, C Finn… - arXiv preprint arXiv …, 2022 - arxiv.org
We study how visual representations pre-trained on diverse human video data can enable
data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a …

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

W Tan, B Liu, J Zhang, R Song, J Fu - arXiv preprint arXiv:2403.07312, 2024 - arxiv.org
Modeling a generalized visuomotor policy has been a longstanding challenge for both
computer vision and robotics communities. Existing approaches often fail to efficiently …

Learning Manipulation by Predicting Interaction

J Zeng, Q Bu, B Wang, W Xia, L Chen, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Representation learning approaches for robotic manipulation have boomed in recent years.
Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large …

Act3d: Infinite resolution action detection transformer for robotic manipulation

T Gervet, Z Xian, N Gkanatsios… - arXiv preprint arXiv …, 2023 - arxiv.org
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Learning high-level robotic manipulation actions with visual predictive model

A Ma, G Chi, S Ivaldi, L Chen - Complex & Intelligent Systems, 2024 - Springer
Learning visual predictive models has great potential for real-world robot manipulations.
Visual predictive models serve as a model of real-world dynamics to comprehend the …

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

P Li, T Liu, Y Li, M Han, H Geng, S Wang, Y Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous robotic systems capable of learning novel manipulation tasks are poised to
transform industries from manufacturing to service automation. However, modern methods …

Scaling Manipulation Learning with Visual Kinematic Chain Prediction

X Zhang, Y Liu, H Chang, A Boularias - arXiv preprint arXiv:2406.07837, 2024 - arxiv.org
Learning general-purpose models from diverse datasets has achieved great success in
machine learning. In robotics, however, existing methods in multi-task learning are typically …

Learning multi-step robotic manipulation policies from visual observation of scene and q-value predictions of previous action

S Kumra, S Joshi, F Sahin - 2022 International Conference on …, 2022 - ieeexplore.ieee.org
In this work, we focus on multi-step manipulation tasks that involve long-horizon planning
and considers progress reversal. Such tasks interlace high-level reasoning that consists of …

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

C Jin, W Tan, J Yang, B Liu, R Song, L Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose a novel framework for learning high-level cognitive capabilities in robot
manipulation tasks, such as making a smiley face using building blocks. These tasks often …