所有版本 - 学术资源搜索

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

Distilled dual-encoder model for vision-language understanding

Z Wang, W Wang, H Zhu, M Liu, B Qin, F Wei - arXiv preprint arXiv …, 2021 - arxiv.org

We propose a cross-modal attention distillation framework to train a dual-encoder model for
vision-language understanding tasks, such as visual reasoning and visual question …

被引用次数：25 相关文章

Distilled Dual-Encoder Model for Vision-Language Understanding

Z Wang, W Wang, H Zhu, M Liu, B Qin, F Wei - arXiv e-prints, 2021 - ui.adsabs.harvard.edu

We propose a cross-modal attention distillation framework to train a dual-encoder model for
vision-language understanding tasks, such as visual reasoning and visual question …

Distilled Dual-Encoder Model for Vision-Language Understanding

Z Wang, W Wang, H Zhu, M Liu, B Qin… - Proceedings of the 2022 …, 2022 - aclanthology.org

On vision-language understanding (VLU) tasks, fusion-encoder vision-language models
achieve superior results but sacrifice efficiency because of the simultaneous encoding of …

[PDF] researchgate.net

[PDF][PDF] Distilled Dual-Encoder Model for Vision-Language Understanding

Z Wang, W Wang, H Zhu, M Liu, B Qin, F Wei - researchgate.net

We propose a cross-modal attention distillation framework to train a dual-encoder model for
vision-language understanding tasks, such as visual reasoning and visual question …

高级搜索

QQ 群

Distilled dual-encoder model for vision-language understanding

Distilled Dual-Encoder Model for Vision-Language Understanding

Distilled Dual-Encoder Model for Vision-Language Understanding

[PDF][PDF] Distilled Dual-Encoder Model for Vision-Language Understanding

引用