Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles

C Cui, Y Ma, X Cao, W Ye… - IEEE Intelligent …, 2024 - ieeexplore.ieee.org
The fusion of human-centric design and artificial intelligence capabilities has opened up
new possibilities for next-generation autonomous vehicles that go beyond traditional …

Gpt as psychologist? preliminary evaluations for gpt-4v on visual affective computing

H Lu, X Niu, J Wang, Y Wang, Q Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) are designed to process and integrate
information from multiple sources such as text speech images and videos. Despite its …

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

Cityllava: Efficient fine-tuning for vlms in city scenario

Z Duan, H Cheng, D Xu, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the vast and dynamic landscape of urban settings Traffic Safety Description and Analysis
plays a pivotal role in applications ranging from insurance inspection to accident prevention …

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

X Cao, T Zhou, Y Ma, W Ye, C Cui… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language generative AI has demonstrated remarkable promise for empowering cross-
modal scene understanding of autonomous driving and high-definition (HD) map systems …

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Y Li, W Zhang, K Chen, Y Liu, P Li, R Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to
understand images and videos, have received widespread attention in the autonomous …

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving

B Yang, H Su, N Gkanatsios, TW Ke… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models excel at modeling complex and multimodal trajectory distributions for
decision-making and control. Reward-gradient guided denoising has been recently …

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

S Nasiriany, F Xia, W Yu, T Xiao, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey

M Xu, D Niyato, J Kang, Z Xiong, A Jamalipour… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of
intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets …