相关文章- 学术资源搜索

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

[PDF] arxiv.org

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, C Hu, Y Wang, K Zhan… - arXiv preprint arXiv …, 2024 - arxiv.org

A primary hurdle of autonomous driving in urban environments is understanding complex
and long-tail scenarios, such as challenging road conditions and delicate human behaviors …

被引用次数：23 相关文章所有 2 个版本

[PDF] thecvf.com

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

X Ding, J Han, H Xu, X Liang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The rise of multimodal large language models (MLLMs) has spurred interest in language-
based driving tasks. However existing research typically focuses on limited tasks and often …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driving

X Ding, J Han, H Xu, W Zhang, X Li - arXiv preprint arXiv:2309.05186, 2023 - arxiv.org

Autonomous driving systems generally employ separate models for different tasks resulting
in intricate designs. For the first time, we leverage singular multimodal large language …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving

L Wen, X Yang, D Fu, X Wang, P Cai, X Li, T Ma… - arXiv preprint arXiv …, 2023 - arxiv.org

The pursuit of autonomous driving technology hinges on the sophisticated integration of
perception, decision-making, and control systems. Traditional approaches, both data-driven …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Automated evaluation of large vision-language models on self-driving corner cases

Y Li, W Zhang, K Chen, Y Liu, P Li, R Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to
understand images and videos, have received widespread attention in the autonomous …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

V Dewangan, T Choudhary, S Chandhok… - arXiv preprint arXiv …, 2023 - arxiv.org

Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps
in autonomous driving contexts. While existing perception systems for autonomous driving …

被引用次数：26 相关文章

[PDF] aaai.org

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

G Liao, J Li, X Ye - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Vision and language foundation models (VLMs) have showcased impressive capabilities in
2D scene understanding. However, their latent potential in elevating the understanding of …

被引用次数：4 相关文章

[PDF] arxiv.org

Drivegpt4: Interpretable end-to-end autonomous driving via large language model

Z Xu, Y Zhang, E Xie, Z Zhao, Y Guo, KKY Wong… - arXiv preprint arXiv …, 2023 - arxiv.org

In the past decade, autonomous driving has experienced rapid development in both
academia and industry. However, its limited interpretability remains a significant unsolved …

被引用次数：88 相关文章所有 5 个版本

[PDF] arxiv.org

Embodied understanding of driving scenarios

Y Zhou, L Huang, Q Bu, J Zeng, T Li, H Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

Embodied scene understanding serves as the cornerstone for autonomous agents to
perceive, interpret, and respond to open driving scenarios. Such understanding is typically …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

Drivevlm: The convergence of autonomous driving and large vision-language models

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driving

On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving

Automated evaluation of large vision-language models on self-driving corner cases

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

Drivegpt4: Interpretable end-to-end autonomous driving via large language model

Embodied understanding of driving scenarios

相关搜索

引用