[PDF][PDF] Drive like a human: Rethinking autonomous driving with large language models

D Fu, X Li, L Wen, M Dou, P Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper, we explore the potential of using a large language model (LLM) to understand
the driving environment in a human-like manner and analyze its ability to reason, interpret …

Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles

C Cui, Y Ma, X Cao, W Ye… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The future of autonomous vehicles lies in the convergence of human-centric design and
advanced AI capabilities. Autonomous vehicles of the future will not only transport …

Multi-modal fusion transformer for end-to-end autonomous driving

A Prakash, K Chitta, A Geiger - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
How should representations from complementary sensors be integrated for autonomous
driving? Geometry-based sensor fusion has shown great promise for perception tasks such …

Charting new territories: Exploring the geographic and geospatial capabilities of multimodal llms

J Roberts, T Lüddecke, R Sheikh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have shown remarkable capabilities across a
broad range of tasks but their knowledge and abilities in the geographic and geospatial …

Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models

TH Wang, A Maalouf, W Xiao, Y Ban, A Amini… - arXiv preprint arXiv …, 2023 - arxiv.org
As autonomous driving technology matures, end-to-end methodologies have emerged as a
leading strategy, promising seamless integration from perception to control via deep …

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Z Shao, Z Yu, J Yu, X Ouyang, L Zheng, Z Gai… - arXiv preprint arXiv …, 2024 - arxiv.org
By harnessing the capabilities of large language models (LLMs), recent large multimodal
models (LMMs) have shown remarkable versatility in open-world multimodal understanding …

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - arXiv preprint arXiv:2309.05519, 2023 - arxiv.org
While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

OpenAnnotate2: Multi-Modal Auto-Annotating for Autonomous Driving

Y Zhou, L Cai, X Cheng, Q Zhang, X Xue… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The demand for high-quality annotated data has surged in recent years for applications
driven by real-world artificial intelligence, such as autonomous driving and embodied …

Gemini in reasoning: Unveiling commonsense in multimodal large language models

Y Wang, Y Zhao - arXiv preprint arXiv:2312.17661, 2023 - arxiv.org
The burgeoning interest in Multimodal Large Language Models (MLLMs), such as OpenAI's
GPT-4V (ision), has significantly impacted both academic and industrial realms. These …

Efficient multimodal large language models: A survey

Y Jin, J Li, Y Liu, T Gu, K Wu, Z Jiang, M He… - arXiv preprint arXiv …, 2024 - arxiv.org
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated
remarkable performance in tasks such as visual question answering, visual understanding …