相关文章- 学术资源搜索

Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

Y Xia, S Liu, Q Yu, L Deng, Y Zhang, H Su… - arXiv preprint arXiv …, 2023 - arxiv.org

Autonomous driving is an emerging technology that has advanced rapidly over the last
decade. Modern transportation is expected to benefit greatly from a wise decision-making …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

AR Sekkat, R Mohan, O Sawade, E Matthes… - arXiv preprint arXiv …, 2023 - arxiv.org

Unlike humans, who can effortlessly estimate the entirety of objects even when partially
occluded, modern computer vision algorithms still find this aspect extremely challenging …

被引用次数：3 相关文章

[PDF] arxiv.org

Exploring perceptual limitation of multimodal large language models

J Zhang, J Hu, M Khayatkhoei, F Ilievski… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual
capability in answering visual questions, however, little is known about the limits of their …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Tinygpt-v: Efficient multimodal large language model via small backbones

Z Yuan, Z Li, L Sun - arXiv preprint arXiv:2312.16862, 2023 - arxiv.org

In the era of advanced multimodel learning, multimodal large language models (MLLMs)
such as GPT-4V have made remarkable strides towards bridging language and visual …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

X He, L Wei, L Xie, Q Tian - arXiv preprint arXiv:2401.03105, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a
plethora of noteworthy contributions in recent months. The prevailing trend involves …

被引用次数：3 相关文章所有 2 个版本

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arXiv preprint arXiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

被引用次数：307 相关文章所有 3 个版本

[PDF] thecvf.com

Pink: Unveiling the power of referential comprehension for multi-modal llms

S Xuan, Q Guo, M Yang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities
in various multi-modal tasks. Nevertheless their performance in fine-grained image …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Y Li, S Jiang, B Hu, L Wang, W Zhong, W Luo… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in Multimodal Large Language Models (MLLMs) underscore the
significance of scalable models and data to boost performance, yet this often incurs …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI

K Huang, B Yang, W Gao - arXiv preprint arXiv:2312.07886, 2023 - arxiv.org

Large Language Models (LLMs) are capable of reasoning over diverse input data modalities
through pre-trained encoders. However, the growing diversity of input data modalities …

被引用次数：1 相关文章所有 2 个版本

[PDF] ieee.org

Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints

P Testolina, F Barbato, U Michieli… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Accurate scene understanding from multiple sensors mounted on cars is a key requirement
for autonomous driving systems. Nowadays, this task is mainly performed through data …

被引用次数：30 相关文章所有 6 个版本

高级搜索

QQ 群

Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

Exploring perceptual limitation of multimodal large language models

Tinygpt-v: Efficient multimodal large language model via small backbones

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Pink: Unveiling the power of referential comprehension for multi-modal llms

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI

Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints

引用