Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

X Pan, T Ye, D Han, S Song… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent years have witnessed the fast development of large-scale pre-training frameworks
that can extract multi-modal representations in a unified form and achieve promising …

被引用次数：29 相关文章所有 6 个版本

[PDF] thecvf.com

Exposing and mitigating spurious correlations for cross-modal retrieval

JM Kim, A Koepke, C Schmid… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Cross-modal retrieval methods are the preferred tool to search databases for the text that
best matches a query image and vice versa However, image-text retrieval models commonly …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Cross-modal retrieval: a systematic review of methods and future directions

L Zhu, T Wang, F Li, J Li, Z Zhang, HT Shen - arXiv preprint arXiv …, 2023 - arxiv.org

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval
methods struggle to meet the needs of users demanding access to data from various …

被引用次数：6 相关文章所有 3 个版本

[PDF] aaai.org

CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels

S Li, L Sun, Q Li - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Pre-trained vision-language models like CLIP have recently shown superior performances
on various downstream tasks, including image classification and segmentation. However, in …

被引用次数：35 相关文章所有 4 个版本

[PDF] thecvf.com

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

被引用次数：18 相关文章所有 8 个版本

[PDF] thecvf.com

Carzero: Cross-attention alignment for radiology zero-shot classification

H Lai, Q Yao, Z Jiang, R Wang, Z He… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract The advancement of Zero-Shot Learning in the medical domain has been driven
forward by using pre-trained models on large-scale image-text pairs focusing on image-text …

被引用次数：2 相关文章所有 3 个版本

[PDF] thecvf.com

Fashionsap: Symbols and attributes prompt for fine-grained fashion vision-language pre-training

Y Han, L Zhang, Q Chen, Z Chen, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Fashion vision-language pre-training models have shown efficacy for a wide range of
downstream tasks. However, general vision-language pre-training models pay less attention …

被引用次数：7 相关文章所有 5 个版本

Aesclip: Multi-attribute contrastive learning for image aesthetics assessment

X Sheng, L Li, P Chen, J Wu, W Dong, Y Yang… - Proceedings of the 31st …, 2023 - dl.acm.org

Image aesthetics assessment (IAA) aims at predicting the aesthetic quality of images.
Recently, large pre-trained vision-language models, like CLIP, have shown impressive …

被引用次数：4 相关文章

[PDF] thecvf.com

Lidarclip or: How i learned to talk to point clouds

G Hess, A Tonderski, C Petersson… - Proceedings of the …, 2024 - openaccess.thecvf.com

Research connecting text and images has recently seen several breakthroughs, with models
like CLIP, DALL* E 2, and Stable Diffusion. However, the connection between text and other …

被引用次数：10 相关文章所有 5 个版本

[PDF] thecvf.com

Representation recovering for self-supervised pre-training on medical images

X Yan, J Naushad, S Sun, K Han… - Proceedings of the …, 2023 - openaccess.thecvf.com

Advances in self-supervised learning, especially in contrastive learning, have drawn
attention to investigating these techniques in providing effective visual representations from …

被引用次数：7 相关文章所有 4 个版本

高级搜索

QQ 群