Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

A survey of reasoning with foundation models

J Sun, C Zheng, E Xie, Z Liu, R Chu, J Qiu, J Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …

Closed-loop open-vocabulary mobile manipulation with gpt-4v

P Zhi, Z Zhang, M Han, Z Zhang, Z Li, Z Jiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous robot navigation and manipulation in open environments require reasoning
and replanning with closed-loop feedback. We present COME-robot, the first closed-loop …

Copa: General robotic manipulation through spatial constraints of parts with foundation models

H Huang, F Lin, Y Hu, S Wang, Y Gao - arXiv preprint arXiv:2403.08248, 2024 - arxiv.org
Foundation models pre-trained on web-scale data are shown to encapsulate extensive
world knowledge beneficial for robotic manipulation in the form of task planning. However …

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

S Nasiriany, F Xia, W Yu, T Xiao, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

Learning-based legged locomotion; state of the art and future perspectives

S Ha, J Lee, M van de Panne, Z Xie, W Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Legged locomotion holds the premise of universal mobility, a critical capability for many real-
world robotic applications. Both model-based and learning-based approaches have …

Voicepilot: Harnessing LLMs as speech interfaces for physically assistive robots

A Padmanabha, J Yuan, J Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
Physically assistive robots present an opportunity to significantly increase the well-being
and independence of individuals with motor impairments or other forms of disability who are …

Neural Scaling Laws for Embodied AI

S Sartor, N Thompson - arXiv preprint arXiv:2405.14005, 2024 - arxiv.org
Scaling laws have driven remarkable progress across machine learning domains like
language modeling and computer vision. However, the exploration of scaling laws in …

Recommendations for designing conversational companion robots with older adults through foundation models

B Irfan, S Kuoppamäki, G Skantze - Frontiers in Robotics and AI, 2024 - frontiersin.org
Companion robots are aimed to mitigate loneliness and social isolation among older adults
by providing social and emotional support in their everyday lives. However, older adults' …

GPT-Fabric: Folding and Smoothing Fabric by Leveraging Pre-Trained Foundation Models

V Raval, E Zhao, H Zhang, S Nikolaidis… - arXiv preprint arXiv …, 2024 - arxiv.org
Fabric manipulation has applications in folding blankets, handling patient clothing, and
protecting items with covers. It is challenging for robots to perform fabric manipulation since …