Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles

C Cui, Y Ma, X Cao, W Ye… - IEEE Intelligent …, 2024 - ieeexplore.ieee.org
The fusion of human-centric design and artificial intelligence capabilities has opened up
new possibilities for next-generation autonomous vehicles that go beyond traditional …

Pivot: Iterative visual prompting elicits actionable knowledge for vlms

S Nasiriany, F Xia, W Yu, T Xiao, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

Gpt as psychologist? preliminary evaluations for gpt-4v on visual affective computing

H Lu, X Niu, J Wang, Y Wang, Q Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) are designed to process and integrate
information from multiple sources such as text speech images and videos. Despite its …

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies

L Li, W Shao, W Dong, Y Tian, K Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
The aspiration of the next generation's autonomous driving (AD) technology relies on the
dedicated integration and interaction among intelligent perception, prediction, planning, and …

Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …