相关文章- 学术资源搜索

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

被引用次数：46 相关文章所有 3 个版本

[PDF] thecvf.com

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

H Lu, N Fei, Y Huo, Y Gao, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Large-scale single-stream pre-training has shown dramatic performance in image-text
retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers …

被引用次数：68 相关文章所有 6 个版本

[PDF] thecvf.com

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

H Chen, G Ding, X Liu, Z Lin, J Liu… - Proceedings of the …, 2020 - openaccess.thecvf.com

Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …

被引用次数：377 相关文章所有 7 个版本

[PDF] pengxi.me

Deep evidential learning with noisy correspondence for cross-modal retrieval

Y Qin, D Peng, X Peng, X Wang, P Hu - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Cross-modal retrieval has been a compelling topic in the multimodal community. Recently,
to mitigate the high cost of data collection, the co-occurred pairs (eg, image and text) could …

被引用次数：45 相关文章所有 2 个版本

[PDF] thecvf.com

Learning cross-modal retrieval with noisy labels

P Hu, X Peng, H Zhu, L Zhen… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning.
However, even for unimodal data, collecting large-scale well-annotated data is expensive …

被引用次数：94 相关文章所有 7 个版本

[PDF] thecvf.com

Polysemous visual-semantic embedding for cross-modal retrieval

Y Song, M Soleymani - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com

Visual-semantic embedding aims to find a shared latent space where related visual and
textual instances are close to each other. Most current methods learn injective embedding …

被引用次数：295 相关文章所有 8 个版本

[PDF] thecvf.com

Vop: Text-video co-operative prompt tuning for cross-modal retrieval

S Huang, B Gong, Y Pan, J Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by
tuning the backbone with additional heavy modules, which not only brings huge …

被引用次数：40 相关文章所有 7 个版本

[PDF] thecvf.com

Vista: Vision and scene text aggregation for cross-modal retrieval

M Cheng, Y Sun, L Wang, X Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual appearance is considered to be the most important cue to understand images for
cross-modal retrieval, while sometimes the scene text appearing in images can provide …

被引用次数：65 相关文章所有 5 个版本

[PDF] thecvf.com

Improving cross-modal retrieval with set of diverse embeddings

D Kim, N Kim, S Kwak - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …

被引用次数：23 相关文章所有 7 个版本

[PDF] arxiv.org

Preserving semantic neighborhoods for robust cross-modal retrieval

C Thomas, A Kovashka - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

The abundance of multimodal data (eg social media posts) has inspired interest in cross-
modal retrieval methods. Popular approaches rely on a variety of metric learning losses …

被引用次数：41 相关文章所有 8 个版本

高级搜索

QQ 群

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval

Deep evidential learning with noisy correspondence for cross-modal retrieval

Learning cross-modal retrieval with noisy labels

Polysemous visual-semantic embedding for cross-modal retrieval

Vop: Text-video co-operative prompt tuning for cross-modal retrieval

Vista: Vision and scene text aggregation for cross-modal retrieval

Improving cross-modal retrieval with set of diverse embeddings

Preserving semantic neighborhoods for robust cross-modal retrieval

引用