Embodied multi-modal agent trained by an llm from a parallel textworld

Y Yang, T Zhou, K Li, D Tao, L Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
While large language models (LLMs) excel in a simulated world of texts they struggle to
interact with the more realistic world without perceptions of other modalities such as visual or …

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Y Yang, T Zhou, K Li, D Tao, L Li, L Shen, X He… - arXiv e …, 2023 - ui.adsabs.harvard.edu
While large language models (LLMs) excel in a simulated world of texts, they struggle to
interact with the more realistic world without perceptions of other modalities such as visual or …

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Y Yang, T Zhou, K Li, D Tao, L Li, L Shen, X He… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) excel in a simulated world of texts, they struggle to
interact with the more realistic world without perceptions of other modalities such as visual or …