Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

文章

学术资源搜索

获得 2 条结果（用时0.03秒）

我的图书馆

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

在引用文章中搜索

[PDF] arxiv.org

Dockylin: A large multimodal model for visual document understanding with efficient visual slimming

J Zhang, W Yang, S Lai, Z Xie, L Jin - arXiv preprint arXiv:2406.19101, 2024 - arxiv.org

Current multimodal large language models (MLLMs) face significant challenges in visual
document understanding (VDU) tasks due to the high resolution, dense text, and complex …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

LLAVADI: What Matters For Multimodal Large Language Models Distillation

S Xu, X Li, H Yuan, L Qi, Y Tong, MH Yang - arXiv preprint arXiv …, 2024 - arxiv.org

The recent surge in Multimodal Large Language Models (MLLMs) has showcased their
remarkable potential for achieving generalized intelligence by integrating visual …

高级搜索

QQ 群

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Dockylin: A large multimodal model for visual document understanding with efficient visual slimming

LLAVADI: What Matters For Multimodal Large Language Models Distillation

引用