[HTML][HTML] Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

H Liao, H Shen, Z Li, C Wang, G Li, Y Bie… - … in Transportation Research, 2024 - Elsevier
In the field of autonomous vehicles (AVs), accurately discerning commander intent and
executing linguistic commands within a visual context presents a significant challenge. This …

Vision language models in autonomous driving and intelligent transportation systems

X Zhou, M Liu, BL Zagar, E Yurtsever… - arXiv preprint arXiv …, 2023 - arxiv.org
The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving
(AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to …

Dme-driver: Integrating human decision logic and 3d scene perception in autonomous driving

W Han, D Guo, CZ Xu, J Shen - arXiv preprint arXiv:2401.03641, 2024 - arxiv.org
In the field of autonomous driving, two important features of autonomous driving car systems
are the explainability of decision logic and the accuracy of environmental perception. This …

Vision language models in autonomous driving: A survey and outlook

X Zhou, M Liu, E Yurtsever, BL Zagar… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD)
have attracted widespread attention due to their outstanding performance and the ability to …

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

Language-Image Models with 3D Understanding

JH Cho, B Ivanovic, Y Cao, E Schmerling… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety
of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and …

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

R Guan, R Zhang, N Ouyang, J Liu, KL Man… - arXiv preprint arXiv …, 2024 - arxiv.org
Embodied perception is essential for intelligent vehicles and robots, enabling more natural
interaction and task execution. However, these advancements currently embrace vision …

Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding

Y Liu, B Sun, G Zheng, Y Wang, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
LiDAR sensors play a crucial role in various applications, especially in autonomous driving.
Current research primarily focuses on optimizing perceptual models with point cloud data as …