With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally …
H Shao, Y Hu, L Wang, G Song… - Proceedings of the …, 2024 - openaccess.thecvf.com
Despite significant recent progress in the field of autonomous driving modern methods still struggle and can incur serious accidents when encountering long-tail unforeseen events …
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human …
Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation but their understanding of the 3D world is notably …
Y Ma, C Cui, X Cao, W Ye, P Liu, J Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Autonomous driving (AD) has made significant strides in recent years. However existing frameworks struggle to interpret and execute spontaneous user instructions such as" …
Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts …
Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their …
Y Hu, T Li, Q Lu, W Shao, J He… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However their potential in the medical domain …
The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to …