GPT-4V Explorations: Mining Autonomous Driving

Z Li - arXiv preprint arXiv:2406.16817, 2024 - arxiv.org
This paper explores the application of the GPT-4V (ision) large visual language model to
autonomous driving in mining environments, where traditional systems often falter in …

On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving

L Wen, X Yang, D Fu, X Wang, P Cai, X Li, T Ma… - arXiv preprint arXiv …, 2023 - arxiv.org
The pursuit of autonomous driving technology hinges on the sophisticated integration of
perception, decision-making, and control systems. Traditional approaches, both data-driven …

On the Road with GPT-4V (ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

L Wen, X Yang, D Fu, X Wang, P Cai, X Li… - ICLR 2024 Workshop …, 2024 - openreview.net
The development of autonomous driving technology depends on merging perception,
decision, and control systems. Traditional strategies have struggled to understand complex …

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

V Dewangan, T Choudhary, S Chandhok… - arXiv preprint arXiv …, 2023 - arxiv.org
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps
in autonomous driving contexts. While existing perception systems for autonomous driving …

GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events

X Zhou, AC Knoll - arXiv preprint arXiv:2402.02205, 2024 - arxiv.org
The recognition and understanding of traffic incidents, particularly traffic accidents, is a topic
of paramount importance in the realm of intelligent transportation systems and intelligent …

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

T Choudhary, V Dewangan, S Chandhok… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps
in autonomous driving contexts. While existing perception systems for autonomous driving …

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, C Hu, Y Wang, K Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org
A primary hurdle of autonomous driving in urban environments is understanding complex
and long-tail scenarios, such as challenging road conditions and delicate human behaviors …

[PDF][PDF] Reason2drive: Towards interpretable and chain-based reasoning for autonomous driving

M Nie, R Peng, C Wang, X Cai, J Han… - arXiv preprint arXiv …, 2023 - s4plus.ustc.edu.cn
Large vision-language models (VLMs) have garnered increasing interest in autonomous
driving areas, due to their advanced capabilities in complex reasoning tasks essential for …

Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving

A Gopalkrishnan, R Greer, M Trivedi - arXiv preprint arXiv:2403.19838, 2024 - arxiv.org
Vision-Language Models (VLMs) and Multi-Modal Language models (MMLMs) have
become prominent in autonomous driving research, as these models can provide …

Dolphins: Multimodal language model for driving

Y Ma, Y Cao, J Sun, M Pavone, C Xiao - arXiv preprint arXiv:2312.00438, 2023 - arxiv.org
The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world
scenarios with human-like understanding and responsiveness. In this paper, we introduce …