J Chen, Y Liu, D Li, X An, Z Feng, Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual …
D Kevian, U Syed, X Guo, A Havens, G Dullerud… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control …
H Yang, Y Zhang, J Xu, H Lu, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org
While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities, fine-tuning these models on downstream, domain-specific datasets is often necessary to …
M Huang, Y Long, X Deng, R Chu, J Xiong… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the …
This study delves into the realm of multi-modality (ie, video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models …
Y Wang, B Jiang, Y Luo, D He, P Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), such as GPT3. 5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific …
X Huang, W Liu, X Chen, X Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention …
C Huang, T Yu, K Xie, S Zhang, L Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Foundation Models (FMs), with their extensive knowledge bases and complex architectures, have offered unique opportunities within the realm of recommender systems …
Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied …