Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

Y Xia, S Liu, Q Yu, L Deng, Y Zhang, H Su… - arXiv preprint arXiv …, 2023 - arxiv.org
Autonomous driving is an emerging technology that has advanced rapidly over the last
decade. Modern transportation is expected to benefit greatly from a wise decision-making …

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

AR Sekkat, R Mohan, O Sawade, E Matthes… - arXiv preprint arXiv …, 2023 - arxiv.org
Unlike humans, who can effortlessly estimate the entirety of objects even when partially
occluded, modern computer vision algorithms still find this aspect extremely challenging …

Exploring perceptual limitation of multimodal large language models

J Zhang, J Hu, M Khayatkhoei, F Ilievski… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual
capability in answering visual questions, however, little is known about the limits of their …

Tinygpt-v: Efficient multimodal large language model via small backbones

Z Yuan, Z Li, L Sun - arXiv preprint arXiv:2312.16862, 2023 - arxiv.org
In the era of advanced multimodel learning, multimodal large language models (MLLMs)
such as GPT-4V have made remarkable strides towards bridging language and visual …

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

X He, L Wei, L Xie, Q Tian - arXiv preprint arXiv:2401.03105, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a
plethora of noteworthy contributions in recent months. The prevailing trend involves …

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arXiv preprint arXiv …, 2023 - stableaiprompts.com
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Pink: Unveiling the power of referential comprehension for multi-modal llms

S Xuan, Q Guo, M Yang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities
in various multi-modal tasks. Nevertheless their performance in fine-grained image …

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in Multimodal Large Language Models (MLLMs) underscore the
significance of scalable models and data to boost performance, yet this often incurs …

Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI

K Huang, B Yang, W Gao - arXiv preprint arXiv:2312.07886, 2023 - arxiv.org
Large Language Models (LLMs) are capable of reasoning over diverse input data modalities
through pre-trained encoders. However, the growing diversity of input data modalities …

Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints

P Testolina, F Barbato, U Michieli… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Accurate scene understanding from multiple sensors mounted on cars is a key requirement
for autonomous driving systems. Nowadays, this task is mainly performed through data …