所有版本 - 学术资源搜索

文章

学术资源搜索

获得 3 条结果（用时0.03秒）

Monkey: Image resolution and text label are important things for large multi-modal models

Z Li, B Yang, Q Liu, Z Ma, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Multimodal Models (LMMs) have shown promise in vision-language tasks but
struggle with high-resolution input and detailed scene understanding. Addressing these …

被引用次数：66 相关文章

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Z Li, B Yang, Q Liu, Z Ma, S Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Multimodal Models have demonstrated impressive capabilities in understanding
general vision-language tasks. However, due to the limitation of supported input resolution …

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Z Li, B Yang, Q Liu, Z Ma, S Zhang, J Yang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Abstract Large Multimodal Models (LMMs) have shown promise in vision-language tasks but
struggle with high-resolution input and detailed scene understanding. Addressing these …

高级搜索

QQ 群

Monkey: Image resolution and text label are important things for large multi-modal models

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

引用