Transporter networks: Rearranging the visual world for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press

Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

被引用次数：335 相关文章所有 5 个版本

[PDF] arxiv.org

Inner monologue: Embodied reasoning through planning with language models

W Huang, F Xia, T Xiao, H Chan, J Liang… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

被引用次数：663 相关文章所有 5 个版本

[PDF] mlr.press

Scaling up and distilling down: Language-guided robot skill acquisition

H Ha, P Florence, S Song - Conference on Robot Learning, 2023 - proceedings.mlr.press

We present a framework for robot skill acquisition, which 1) efficiently scale up data
generation of language-labelled robot data and 2) effectively distills this data down into a …

被引用次数：75 相关文章所有 7 个版本

[PDF] arxiv.org

Socratic models: Composing zero-shot multimodal reasoning with language

A Zeng, M Attarian, B Ichter, K Choromanski… - arXiv preprint arXiv …, 2022 - arxiv.org

Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …

被引用次数：417 相关文章所有 6 个版本

[PDF] arxiv.org

Diffusion policy: Visuomotor policy learning via action diffusion

C Chi, S Feng, Y Du, Z Xu, E Cousineau… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces Diffusion Policy, a new way of generating robot behavior by
representing a robot's visuomotor policy as a conditional denoising diffusion process. We …

被引用次数：252 相关文章所有 6 个版本

[PDF] mlr.press

Real-world robot learning with masked visual pre-training

I Radosavovic, T Xiao, S James… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we explore self-supervised visual pre-training on images from diverse, in-the-
wild videos for real-world robotic tasks. Like prior work, our visual representations are pre …

被引用次数：186 相关文章所有 4 个版本

[PDF] arxiv.org

Learning fine-grained bimanual manipulation with low-cost hardware

TZ Zhao, V Kumar, S Levine, C Finn - arXiv preprint arXiv:2304.13705, 2023 - arxiv.org

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously
difficult for robots because they require precision, careful coordination of contact forces, and …

被引用次数：188 相关文章所有 5 个版本

[PDF] mlr.press

Cliport: What and where pathways for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on robot learning, 2022 - proceedings.mlr.press

How can we imbue robots with the ability to manipulate objects precisely but also to reason
about them in terms of abstract concepts? Recent works in manipulation have shown that …

被引用次数：536 相关文章所有 8 个版本

[PDF] thecvf.com

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

被引用次数：78 相关文章所有 9 个版本

[PDF] neurips.cc

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc

While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …

被引用次数：131 相关文章所有 6 个版本

高级搜索

QQ 群