Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles

C Cui, Y Ma, X Cao, W Ye… - IEEE Intelligent …, 2024 - ieeexplore.ieee.org
The fusion of human-centric design and artificial intelligence capabilities has opened up
new possibilities for next-generation autonomous vehicles that go beyond traditional …

Pivot: Iterative visual prompting elicits actionable knowledge for vlms

S Nasiriany, F Xia, W Yu, T Xiao, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

Gpt as psychologist? preliminary evaluations for gpt-4v on visual affective computing

H Lu, X Niu, J Wang, Y Wang, Q Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) are designed to process and integrate
information from multiple sources such as text speech images and videos. Despite its …

Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies

L Li, W Shao, W Dong, Y Tian, K Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
The aspiration of the next generation's autonomous driving (AD) technology relies on the
dedicated integration and interaction among intelligent perception, prediction, planning, and …

Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

Cityllava: Efficient fine-tuning for vlms in city scenario

Z Duan, H Cheng, D Xu, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the vast and dynamic landscape of urban settings Traffic Safety Description and Analysis
plays a pivotal role in applications ranging from insurance inspection to accident prevention …

Automated evaluation of large vision-language models on self-driving corner cases

Y Li, W Zhang, K Chen, Y Liu, P Li, R Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to
understand images and videos, have received widespread attention in the autonomous …