What matters in language conditioned robotic imitation learning over unstructured data

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - IEEE Access, 2023 - ieeexplore.ieee.org

Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

被引用次数：46 相关文章所有 4 个版本

[PDF] arxiv.org

Rt-2: Vision-language-action models transfer web knowledge to robotic control

A Brohan, N Brown, J Carbajal, Y Chebotar… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

被引用次数：730 相关文章所有 2 个版本

[PDF] arxiv.org

Open x-embodiment: Robotic learning datasets and rt-x models

A O'Neill, A Rehman, A Gupta, A Maddukuri… - arXiv preprint arXiv …, 2023 - arxiv.org

Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

被引用次数：285 相关文章所有 2 个版本

[PDF] arxiv.org

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

被引用次数：429 相关文章所有 6 个版本

[HTML] mlr.press

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

B Zitkovich, T Yu, S Xu, P Xu, T Xiao… - … on Robot Learning, 2023 - proceedings.mlr.press

We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

被引用次数：183 相关文章所有 2 个版本

[PDF] mlr.press

Perceiver-actor: A multi-task transformer for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press

Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

被引用次数：446 相关文章所有 5 个版本

[PDF] mlr.press

Scaling up and distilling down: Language-guided robot skill acquisition

H Ha, P Florence, S Song - Conference on Robot Learning, 2023 - proceedings.mlr.press

We present a framework for robot skill acquisition, which 1) efficiently scale up data
generation of language-labelled robot data and 2) effectively distills this data down into a …

被引用次数：119 相关文章所有 7 个版本

[PDF] arxiv.org

Inner monologue: Embodied reasoning through planning with language models

W Huang, F Xia, T Xiao, H Chan, J Liang… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

被引用次数：846 相关文章所有 5 个版本

[PDF] ed.ac.uk

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration⁰

A O'Neill, A Rehman, A Maddukuri… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

被引用次数：117 相关文章

[PDF] arxiv.org

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org

Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

被引用次数：150 相关文章所有 3 个版本

高级搜索

QQ 群