Inflate and shrink: Enriching and reducing interactions for fast text-image retrieval

X Han, L Yu, X Zhu, L Zhang, YZ Song… - European conference on …, 2022 - Springer

Abstract Large-scale Vision-and-Language (V+ L) pre-training for representation learning
has proven to be effective in boosting various downstream V+ L tasks. However, when it …

被引用次数：43 相关文章所有 7 个版本

[PDF] thecvf.com

Robust cross-modal representation learning with progressive self-distillation

A Andonian, S Chen, R Hamid - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The learning objective of vision-language approach of CLIP does not effectively account for
the noisy many-to-many correspondences found in web-harvested image captioning …

被引用次数：49 相关文章所有 6 个版本

[PDF] aclanthology.org

Cross-lingual cross-modal consolidation for effective multilingual video corpus moment retrieval

J Liu, T Yu, H Peng, M Sun, P Li - Findings of the Association for …, 2022 - aclanthology.org

Existing multilingual video corpus moment retrieval (mVCMR) methods are mainly based on
a two-stream structure. The visual stream utilizes the visual content in the video to estimate …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Balance act: Mitigating hubness in cross-modal retrieval with query and gallery banks

Y Wang, X Jian, B Xue - arXiv preprint arXiv:2310.11612, 2023 - arxiv.org

In this work, we present a post-processing solution to address the hubness problem in cross-
modal retrieval, a phenomenon where a small number of gallery data points are frequently …

被引用次数：5 相关文章所有 4 个版本

[PDF] archive.org

Cross-probe BERT for fast cross-modal search

T Yu, H Fei, P Li - Proceedings of the 45th International ACM SIGIR …, 2022 - dl.acm.org

Owing to the effectiveness of cross-modal attentions, text-vision BERT models have
achieved excellent performance in text-image retrieval. Nevertheless, cross-modal …

被引用次数：8 相关文章所有 2 个版本

[PDF] academia.edu

U-BERT for fast and scalable text-image retrieval

T Yu, H Fei, P Li - Proceedings of the 2022 ACM SIGIR International …, 2022 - dl.acm.org

Exploiting cross-modal attention on image region features and text features, cross-modal
BERT models have achieved higher accuracy than the embedding-based methods in cross …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Towards fast and accurate image-text retrieval with self-supervised fine-grained alignment

J Zhuang, J Yu, Y Ding, X Qu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Image-text retrieval requires the system to bridge the heterogenous gap between vision and
language for accurate retrieval while keeping the network lightweight-enough for efficient …

被引用次数：4 相关文章所有 4 个版本

Multi-scale multi-modal dictionary BERT for effective text-image retrieval in multimedia advertising

T Yu, J Liu, Z Jin, Y Yang, H Fei, P Li - Proceedings of the 31st ACM …, 2022 - dl.acm.org

Visual content in multimedia advertising effectively attracts the customer's attention. Search-
based multimedia advertising is a cross-modal retrieval problem. Due to the modal gap …

被引用次数：6 相关文章

Fast, Accurate, and Lightweight Memory-Enhanced Embedding Learning Framework for Image-Text Retrieval

Z Li, L Zhang, K Zhang, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Image-text retrieval is a fundamental task in bridging the semantics between vision and
language. The key challenge lies in accurately and efficiently learning the semantic …

被引用次数：2 相关文章

Texture BERT for cross-modal texture image retrieval

Z Xu, T Yu, P Li - Proceedings of the 31st ACM International Conference …, 2022 - dl.acm.org

We propose Texture BERT, a model describing visual attributes of texture using natural
language. To capture the rich details in texture images, we propose a group-wise compact …

被引用次数：2 相关文章

高级搜索

QQ 群