Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P Xiao - arXiv preprint arXiv:2402.02544, 2024 - arxiv.org
The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

Rag-driver: Generalisable driving explanations with retrieval-augmented in-context learning in multi-modal large language model

J Yuan, S Sun, D Omeiza, B Zhao, P Newman… - arXiv preprint arXiv …, 2024 - arxiv.org
Robots powered by'blackbox'models need to provide human-understandable explanations
which we can trust. Hence, explainability plays a critical role in trustworthy autonomous …

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

D Fu, W Lei, L Wen, P Cai, S Mao, M Dou, B Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of Multimodal Large Language Models ((M) LLMs) has ushered in new
avenues in artificial intelligence, particularly for autonomous driving by offering enhanced …

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Q Wu, H Zhao, M Saxon, T Bui, WY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision language models (VLMs) are an exciting emerging class of language models (LMs)
that have merged classic LM capabilities with those of image processing systems. However …

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

C Pang, J Wu, J Li, Y Liu, J Sun, W Li, X Weng… - arXiv preprint arXiv …, 2024 - arxiv.org
The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform
poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature …