Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation

W Zhu, X Ma, D Ro, H Ci, J Zhang, J Shi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Human motion generation aims to generate natural human pose sequences and shows
immense potential for real-world applications. Substantial progress has been made recently …

被引用次数：25 相关文章所有 8 个版本

[PDF] arxiv.org

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

被引用次数：218 相关文章所有 6 个版本

[PDF] arxiv.org

Tidybot: Personalized robot assistance with large language models

J Wu, R Antonova, A Kan, M Lepert, A Zeng, S Song… - Autonomous …, 2023 - Springer

For a robot to personalize physical assistance effectively, it must learn user preferences that
can be generally reapplied to future scenarios. In this work, we investigate personalization of …

被引用次数：176 相关文章所有 13 个版本

[PDF] neurips.cc

Scalable 3d captioning with pretrained models

T Luo, C Rockwell, H Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc

We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects.
This approach utilizes pretrained models from image captioning, image-text alignment, and …

被引用次数：76 相关文章所有 6 个版本

[PDF] arxiv.org

Mimicplay: Long-horizon imitation learning by watching human play

C Wang, L Fan, J Sun, R Zhang, L Fei-Fei, D Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Imitation learning from human demonstrations is a promising paradigm for teaching robots
manipulation skills in the real world. However, learning complex long-horizon tasks often …

被引用次数：59 相关文章所有 6 个版本

[PDF] arxiv.org

Building cooperative embodied agents modularly with large language models

H Zhang, W Du, J Shan, Q Zhou, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated impressive planning abilities in single-
agent embodied tasks across various domains. However, their capacity for planning and …

被引用次数：36 相关文章所有 4 个版本

[PDF] openreview.net

Vima: Robot manipulation with multimodal prompts

Y Jiang, A Gupta, Z Zhang, G Wang, Y Dou, Y Chen… - 2023 - openreview.net

Prompt-based learning has emerged as a successful paradigm in natural language
processing, where a single general-purpose language model can be instructed to perform …

被引用次数：67 相关文章所有 3 个版本

[PDF] arxiv.org

Octopus: Embodied vision-language programmer from environmental feedback

J Yang, Y Dong, S Liu, B Li, Z Wang, C Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large vision-language models (VLMs) have achieved substantial progress in multimodal
perception and reasoning. Furthermore, when seamlessly integrated into an embodied …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

Maniskill2: A unified benchmark for generalizable manipulation skills

J Gu, F Xiang, X Li, Z Ling, X Liu, T Mu, Y Tang… - arXiv preprint arXiv …, 2023 - arxiv.org

Generalizable manipulation skills, which can be composed to tackle long-horizon and
complex daily chores, are one of the cornerstones of Embodied AI. However, existing …

被引用次数：60 相关文章所有 3 个版本

[PDF] arxiv.org

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

被引用次数：9 相关文章所有 4 个版本

高级搜索

QQ 群