Mova: Adapting mixture of vision experts to multimodal context

ZA Nazi, W Peng - Informatics, 2024 - mdpi.com

The deployment of large language models (LLMs) within the healthcare sector has sparked
both enthusiasm and apprehension. These models exhibit the remarkable ability to provide …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

X Zhao, X Li, H Duan, H Huang, Y Li, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-modal large language models (MLLMs) have made significant strides in various visual
understanding tasks. However, the majority of these models are constrained to process low …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

BK Lee, CW Kim, B Park, YM Ro - arXiv preprint arXiv:2405.15574, 2024 - arxiv.org

The rapid development of large language and vision models (LLVMs) has been driven by
advances in visual instruction tuning. Recently, open-source LLVMs have curated high …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

L Meng, J Yang, R Tian, X Dai, Z Wu, J Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Most large multimodal models (LMMs) are implemented by feeding visual tokens as a
sequence into the first layer of a large language model (LLM). The resulting architecture is …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

B Ma, Z Zong, G Song, H Li, Y Liu - arXiv preprint arXiv:2406.11831, 2024 - arxiv.org

Large language models (LLMs) based on decoder-only transformers have demonstrated
superior text understanding capabilities compared to CLIP and T5-series models. However …

被引用次数：1 相关文章

[PDF] arxiv.org

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

M Shi, F Liu, S Wang, S Liao, S Radhakrishnan… - arXiv preprint arXiv …, 2024 - arxiv.org

The ability to accurately interpret complex visual information is a crucial topic of multimodal
large language models (MLLMs). Recent work indicates that enhanced visual perception …

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

J Zhang, T Wang, H Zhang, P Lu, F Zheng - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision-language models (LVLMs) have shown promising performance on a variety of
vision-language tasks. However, they remain susceptible to hallucinations, generating …

[PDF][PDF] The Evolution of MoE: A Survey from Basics to Breakthroughs

A VATS, R RAJA, V JAIN, A CHADHA - 2024 - researchgate.net

Authors' Contact Information: Arpita Vats, arpita. vats09@ gmail. com, Santa Clara
University, Santa Clara, California, USA; Rahul Raja, Carnegie Mellon University …

[PDF] preprints.org

The Evolution of Mixture of Experts: A Survey from Basics to Breakthroughs

A Vats, R Raja, V Jain, A Chadha - 2024 - preprints.org

Abstract The Mixture of Experts (MoE) architecture has evolved as a powerful and versatile
approach for improving the performance and efficiency of deep learning models. This survey …

高级搜索

QQ 群