The colosseum: A benchmark for evaluating generalization for robotic manipulation

A Longhini, Y Wang, I Garcia-Camacho… - Annual Review of …, 2024 - annualreviews.org

The realm of textiles spans clothing, households, healthcare, sports, and industrial
applications. The deformable nature of these objects poses unique challenges that prior …

被引用次数：7 相关文章所有 7 个版本

[PDF] arxiv.org

Data scaling laws in imitation learning for robotic manipulation

F Lin, Y Hu, P Sheng, C Wen, J You, Y Gao - arXiv preprint arXiv …, 2024 - arxiv.org

Data scaling has revolutionized fields like natural language processing and computer vision,
providing models with remarkable generalization capabilities. In this paper, we investigate …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning

Z Yuan, T Wei, S Cheng, G Zhang, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-
world scenarios? In this paper, we propose\textbf {Maniwhere}, a generalizable framework …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Green screen augmentation enables scene generalisation in robotic manipulation

E Teoh, S Patidar, X Ma, S James - arXiv preprint arXiv:2407.07868, 2024 - arxiv.org

Generalising vision-based manipulation policies to novel environments remains a
challenging area with limited exploration. Current practices involve collecting data in one …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Aha: A vision-language-model for detecting and reasoning over failures in robotic manipulation

J Duan, W Pumacay, N Kumar, YR Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Robotic manipulation in open-world settings requires not only task execution but also the
ability to detect and learn from failures. While recent advances in vision-language models …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Generative image as action models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org

Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

3d-mvp: 3d multiview pretraining for robotic manipulation

S Qian, K Mo, V Blukis, DF Fouhey, D Fox… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent works have shown that visual pretraining on egocentric datasets using masked
autoencoders (MAE) can improve generalization for downstream robotics tasks. However …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

J Qian, A Panagopoulos, D Jayaraman - arXiv preprint arXiv:2405.15916, 2024 - arxiv.org

Generic re-usable pre-trained image representation encoders have become a standard
component of methods for many computer vision tasks. As visual representations for robots …

被引用次数：3 相关文章所有 3 个版本

[PDF] openreview.net

Position: scaling simulation is neither necessary nor sufficient for in-the-wild robot manipulation

H Bharadhwaj - Forty-first International Conference on Machine …, 2024 - openreview.net

In this paper, we develop a structured critique of robotic simulations for real-world
manipulation, by arguing that scaling simulators is neither necessary nor sufficient for …

Investigating the role of instruction variety and task difficulty in robotic manipulation tasks

A Parekh, N Vitsakis, A Suglia, I Konstas - arXiv preprint arXiv:2407.03967, 2024 - arxiv.org

Evaluating the generalisation capabilities of multimodal models based solely on their
performance on out-of-distribution data fails to capture their true robustness. This work …

被引用次数：1 相关文章所有 5 个版本

高级搜索

QQ 群