Human motion generation: A survey

W Zhu, X Ma, D Ro, H Ci, J Zhang, J Shi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Human motion generation aims to generate natural human pose sequences and shows
immense potential for real-world applications. Substantial progress has been made recently …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

Tidybot: Personalized robot assistance with large language models

J Wu, R Antonova, A Kan, M Lepert, A Zeng, S Song… - Autonomous …, 2023 - Springer
For a robot to personalize physical assistance effectively, it must learn user preferences that
can be generally reapplied to future scenarios. In this work, we investigate personalization of …

Scalable 3d captioning with pretrained models

T Luo, C Rockwell, H Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc
We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects.
This approach utilizes pretrained models from image captioning, image-text alignment, and …

Mimicplay: Long-horizon imitation learning by watching human play

C Wang, L Fan, J Sun, R Zhang, L Fei-Fei, D Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Imitation learning from human demonstrations is a promising paradigm for teaching robots
manipulation skills in the real world. However, learning complex long-horizon tasks often …

Building cooperative embodied agents modularly with large language models

H Zhang, W Du, J Shan, Q Zhou, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-
agent embodied tasks across various domains. However, their capacity for planning and …

Vima: Robot manipulation with multimodal prompts

Y Jiang, A Gupta, Z Zhang, G Wang, Y Dou, Y Chen… - 2023 - openreview.net
Prompt-based learning has emerged as a successful paradigm in natural language
processing, where a single general-purpose language model can be instructed to perform …

Octopus: Embodied vision-language programmer from environmental feedback

J Yang, Y Dong, S Liu, B Li, Z Wang, C Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large vision-language models (VLMs) have achieved substantial progress in multimodal
perception and reasoning. Furthermore, when seamlessly integrated into an embodied …

Maniskill2: A unified benchmark for generalizable manipulation skills

J Gu, F Xiang, X Li, Z Ling, X Liu, T Mu, Y Tang… - arXiv preprint arXiv …, 2023 - arxiv.org
Generalizable manipulation skills, which can be composed to tackle long-horizon and
complex daily chores, are one of the cornerstones of Embodied AI. However, existing …

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …