Building cooperative embodied agents modularly with large language models

H Zhang, W Du, J Shan, Q Zhou, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-
agent embodied tasks across various domains. However, their capacity for planning and …

Mindagent: Emergent gaming interaction

R Gong, Q Huang, X Ma, H Vo, Z Durante… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have the capacity of performing complex scheduling in a
multi-agent system and can coordinate these agents into completing sophisticated tasks that …

The socialai school: Insights from developmental psychology towards artificial socio-cultural agents

G Kovač, R Portelas, PF Dominey… - arXiv preprint arXiv …, 2023 - arxiv.org
Developmental psychologists have long-established the importance of socio-cognitive
abilities in human intelligence. These abilities enable us to enter, participate and benefit …

Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning

T Zhi-Xuan, L Ying, V Mansinghka… - arXiv preprint arXiv …, 2024 - arxiv.org
People often give instructions whose meaning is ambiguous without further context,
expecting that their actions or goals will disambiguate their intentions. How can we build …

Enhancing Human-AI Collaboration Through Logic-Guided Reasoning

C Cao, Y Fu, S Xu, R Zhang, S Li - The Twelfth International …, 2024 - openreview.net
We present a systematic framework designed to enhance human-robot perception and
collaboration through the integration of logical rules and Theory of Mind (ToM). Logical rules …

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

P Ren, M Li, Z Luo, X Song, Z Chen, W Liufu… - arXiv preprint arXiv …, 2024 - arxiv.org
Realizing scaling laws in embodied AI has become a focus. However, previous work has
been scattered across diverse simulation platforms, with assets and models lacking unified …

Infer Human's Intentions Before Following Natural Language Instructions

Y Wan, Y Wu, Y Wang, J Mao, N Jaques - arXiv preprint arXiv:2409.18073, 2024 - arxiv.org
For AI agents to be helpful to humans, they should be able to follow natural language
instructions to complete everyday cooperative tasks in human environments. However, real …

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos

D Verma, D Roy, B Fernando - arXiv preprint arXiv:2407.20642, 2024 - arxiv.org
Situation recognition refers to the ability of an agent to identify and understand various
situations or contexts based on available information and sensory inputs. It involves the …

A Landmark-Aware Visual Navigation Dataset

F Johnson, BB Cao, K Dana, S Jain… - arXiv preprint arXiv …, 2024 - arxiv.org
Map representation learned by expert demonstrations has shown promising research value.
However, recent advancements in the visual navigation field face challenges due to the lack …

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

K Zhang, P Ren, B Lin, J Lin, S Ma, H Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Language-guided robotic manipulation is a challenging task that requires an embodied
agent to follow abstract user instructions to accomplish various complex manipulation tasks …