Dolphins: Multimodal language model for driving

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

被引用次数：54 相关文章所有 2 个版本

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P Xiao - arXiv preprint arXiv:2402.02544, 2024 - arxiv.org

The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Rag-driver: Generalisable driving explanations with retrieval-augmented in-context learning in multi-modal large language model

J Yuan, S Sun, D Omeiza, B Zhao, P Newman… - arXiv preprint arXiv …, 2024 - arxiv.org

Robots powered by'blackbox'models need to provide human-understandable explanations
which we can trust. Hence, explainability plays a critical role in trustworthy autonomous …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

D Fu, W Lei, L Wen, P Cai, S Mao, M Dou, B Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of Multimodal Large Language Models ((M) LLMs) has ushered in new
avenues in artificial intelligence, particularly for autonomous driving by offering enhanced …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Q Wu, H Zhao, M Saxon, T Bui, WY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision language models (VLMs) are an exciting emerging class of language models (LMs)
that have merged classic LM capabilities with those of image processing systems. However …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

C Pang, J Wu, J Li, Y Liu, J Sun, W Li, X Weng… - arXiv preprint arXiv …, 2024 - arxiv.org

The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform
poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群