From gpt-4 to gemini and beyond: Assessing the landscape of mllms on generalizability, trustworthiness and causality through four modalities

C Lu, C Qian, G Zheng, H Fan, H Gao, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in
generating reasonable responses with respect to multi-modal contents. However, there is …

AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning

S Hu, Z Fang, Z Fang, X Chen, Y Fang - arXiv preprint arXiv:2404.06345, 2024 - arxiv.org
Connected and autonomous driving is developing rapidly in recent years. However, current
autonomous driving systems, which are primarily based on data-driven approaches, exhibit …

LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models

M Peng, X Guo, X Chen, M Zhu, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
To ensure safe driving in dynamic environments, autonomous vehicles should possess the
capability to accurately predict the lane change intentions of surrounding vehicles in …

Beyond task performance: Evaluating and reducing the flaws of large multimodal models with in-context learning

M Shukor, A Rame, C Dancette, M Cord - arXiv preprint arXiv:2310.00647, 2023 - arxiv.org
Following the success of Large Language Models (LLMs), Large Multimodal Models
(LMMs), such as the Flamingo model and its subsequent competitors, have started to …

Adriver-i: A general world model for autonomous driving

F Jia, W Mao, Y Liu, Y Zhao, Y Wen, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Typically, autonomous driving adopts a modular design, which divides the full stack into
perception, prediction, planning and control parts. Though interpretable, such modular …

Ovis: Structural Embedding Alignment for Multimodal Large Language Model

S Lu, Y Li, QG Chen, Z Xu, W Luo, K Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM
with another pre-trained vision transformer through a connector, such as an MLP, endowing …

Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

M Alibeigi, W Ljungbergh, A Tonderski… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing datasets for autonomous driving (AD) often lack diversity and long-range
capabilities, focusing instead on 360* perception and temporal reasoning. To address this …

Large Language Models Powered Context-aware Motion Prediction

X Zheng, L Wu, Z Yan, Y Tang, H Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Motion prediction is among the most fundamental tasks in autonomous driving. Traditional
methods of motion forecasting primarily encode vector information of maps and historical …

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

Gated recurrent fusion to learn driving behavior from temporal multimodal data

A Narayanan, A Siravuru… - IEEE Robotics and …, 2020 - ieeexplore.ieee.org
The Tactical Driver Behavior modeling problem requires an understanding of driver actions
in complicated urban scenarios from rich multimodal signals including video, LiDAR and …