Bridgedata v2: A dataset for robot learning at scale

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Open x-embodiment: Robotic learning datasets and rt-x models

A Padalkar, A Pooley, A Jain, A Bewley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

被引用次数：114 相关文章所有 2 个版本

[PDF] arxiv.org

Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

C Eze, C Crick - arXiv preprint arXiv:2402.07127, 2024 - arxiv.org

Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased
datasets. While curated datasets can help, challenges remain in generalizability and real …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators

P Wu, Y Shentu, Z Yi, X Lin, P Abbeel - arXiv preprint arXiv:2309.13037, 2023 - arxiv.org

Imitation learning from human demonstrations is a powerful framework to teach robots new
skills. However, the performance of the learned policies is bottlenecked by the quality, scale …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill

W Cai, S Huang, G Cheng, Y Long, P Gao… - arXiv preprint arXiv …, 2023 - arxiv.org

Zero-shot object navigation is a challenging task for home-assistance robots. This task
emphasizes visual grounding, commonsense inference and locomotion abilities, where the …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

3d-vla: A 3d vision-language-action generative world model

H Zhen, X Qiu, P Chen, J Yang, X Yan, Y Du… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the
broader realm of the 3D physical world. Furthermore, they perform action prediction by …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Robot fine-tuning made easy: Pre-training rewards and policies for autonomous real-world reinforcement learning

J Yang, MS Mark, B Vu, A Sharma, J Bohg… - arXiv preprint arXiv …, 2023 - arxiv.org

The pre-train and fine-tune paradigm in machine learning has had dramatic success in a
wide range of domains because the use of existing data or pre-trained models on the …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Large-scale actionless video pre-training via discrete diffusion for efficient policy learning

H He, C Bai, L Pan, W Zhang, B Zhao, X Li - arXiv preprint arXiv …, 2024 - arxiv.org

Learning a generalist embodied agent capable of completing multiple tasks poses
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Score models for offline goal-conditioned reinforcement learning

H Sikchi, R Chitnis, A Touati, A Geramifard… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve
multiple goals in an environment purely from offline datasets using sparse reward functions …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Pre-trained text-to-image diffusion models are versatile representation learners for control

G Gupta, K Yadav, Y Gal, D Batra, Z Kira, C Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

Embodied AI agents require a fine-grained understanding of the physical world mediated
through visual and language inputs. Such capabilities are difficult to learn solely from task …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群