查看文章

arxiv.org 中的 [PDF]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

作者

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

发表日期

2024/2/19

期刊

arXiv preprint arXiv:2402.12289

简介

A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of chain-of-thought (CoT) modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. DriveVLM-Dual achieves robust spatial understanding and real-time inference speed. Extensive experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the effectiveness of DriveVLM and the enhanced performance of DriveVLM-Dual, surpassing existing methods in complex and unpredictable driving conditions.

引用总数

被引用次数：19

202419

学术搜索中的文章

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, C Hu, Y Wang, K Zhan, P Jia… - arXiv preprint arXiv:2402.12289, 2024

被引用次数：19 相关文章所有 2 个版本