Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous...

C Cui, Y Ma, X Cao, W Ye, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the emergence of Large Language Models (LLMs) and Vision Foundation Models
(VFMs), multimodal AI systems benefiting from large models have the potential to equally …

被引用次数：86 相关文章所有 7 个版本

[PDF] thecvf.com

Lampilot: An open benchmark dataset for autonomous driving with language model programs

Y Ma, C Cui, X Cao, W Ye, P Liu, J Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Autonomous driving (AD) has made significant strides in recent years. However existing
frameworks struggle to interpret and execute spontaneous user instructions such as" …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Detgpt: Detect what you need via reasoning

R Pi, J Gao, S Diao, R Pan, H Dong, J Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, the field of computer vision has seen significant advancements thanks to the
development of large language models (LLMs). These models have enabled more effective …

被引用次数：40 相关文章所有 8 个版本

[PDF] thecvf.com

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

X Ding, J Han, H Xu, X Liang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The rise of multimodal large language models (MLLMs) has spurred interest in language-
based driving tasks. However existing research typically focuses on limited tasks and often …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Perceptiongpt: Effectively fusing visual perception into llm

R Pi, L Yao, J Gao, J Zhang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The integration of visual inputs with large language models (LLMs) has led to remarkable
advancements in multi-modal capabilities giving rise to vision large language models …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Dolphins: Multimodal language model for driving

Y Ma, Y Cao, J Sun, M Pavone, C Xiao - arXiv preprint arXiv:2312.00438, 2023 - arxiv.org

The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world
scenarios with human-like understanding and responsiveness. In this paper, we introduce …

被引用次数：18 相关文章所有 2 个版本

[PDF] thecvf.com

Human-centric autonomous systems with llms for user command reasoning

Y Yang, Q Zhang, C Li, DS Marta… - Proceedings of the …, 2024 - openaccess.thecvf.com

The evolution of autonomous driving has made remarkable advancements in recent years,
evolving into a tangible reality. However, a human-centric large-scale adoption hinges on …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

R Pi, T Han, Y Xie, R Pan, Q Lian, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org

The deployment of multimodal large language models (MLLMs) has brought forth a unique
vulnerability: susceptibility to malicious attacks through visual inputs. We delve into the novel …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Empowering autonomous driving with large language models: A safety perspective

Y Wang, R Jiao, C Lang, SS Zhan, C Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Autonomous Driving (AD) faces crucial hurdles for commercial launch, notably in the form of
diminished public trust and safety concerns from long-tail unforeseen driving scenarios. This …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Embodied understanding of driving scenarios

Y Zhou, L Huang, Q Bu, J Zeng, T Li, H Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

Embodied scene understanding serves as the cornerstone for autonomous agents to
perceive, interpret, and respond to open driving scenarios. Such understanding is typically …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群