Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

H Lu, N Fei, Y Huo, Y Gao, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large-scale single-stream pre-training has shown dramatic performance in image-text
retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers …

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

H Chen, G Ding, X Liu, Z Lin, J Liu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …

Deep evidential learning with noisy correspondence for cross-modal retrieval

Y Qin, D Peng, X Peng, X Wang, P Hu - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Cross-modal retrieval has been a compelling topic in the multimodal community. Recently,
to mitigate the high cost of data collection, the co-occurred pairs (eg, image and text) could …

Learning cross-modal retrieval with noisy labels

P Hu, X Peng, H Zhu, L Zhen… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Recently, cross-modal retrieval is emerging with the help of deep multimodal learning.
However, even for unimodal data, collecting large-scale well-annotated data is expensive …

Polysemous visual-semantic embedding for cross-modal retrieval

Y Song, M Soleymani - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Visual-semantic embedding aims to find a shared latent space where related visual and
textual instances are close to each other. Most current methods learn injective embedding …

Vop: Text-video co-operative prompt tuning for cross-modal retrieval

S Huang, B Gong, Y Pan, J Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by
tuning the backbone with additional heavy modules, which not only brings huge …

Vista: Vision and scene text aggregation for cross-modal retrieval

M Cheng, Y Sun, L Wang, X Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Visual appearance is considered to be the most important cue to understand images for
cross-modal retrieval, while sometimes the scene text appearing in images can provide …

Improving cross-modal retrieval with set of diverse embeddings

D Kim, N Kim, S Kwak - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …

Preserving semantic neighborhoods for robust cross-modal retrieval

C Thomas, A Kovashka - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
The abundance of multimodal data (eg social media posts) has inspired interest in cross-
modal retrieval methods. Popular approaches rely on a variety of metric learning losses …