Drsl: Deep relational similarity learning for cross-modal retrieval

X Wang, P Hu, L Zhen, D Peng - Information Sciences, 2021 - Elsevier
Cross-modal retrieval aims to retrieve relevant samples across different media modalities.
Existing cross-modal retrieval approaches are contingent on learning common …

Cross-modal retrieval with CNN visual features: A new baseline

Y Wei, Y Zhao, C Lu, S Wei, L Liu… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
Recently, convolutional neural network (CNN) visual features have demonstrated their
powerful ability as a universal representation for various recognition tasks. In this paper …

Cross-modal retrieval: a systematic review of methods and future directions

F Li, L Zhu, T Wang, J Li, Z Zhang, HT Shen - arXiv preprint arXiv …, 2023 - arxiv.org
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval
methods struggle to meet the needs of users demanding access to data from various …

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

H Lu, N Fei, Y Huo, Y Gao, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large-scale single-stream pre-training has shown dramatic performance in image-text
retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers …

Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval

S Su, Z Zhong, C Zhang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Cross-modal hashing encodes the multimedia data into a common binary hash space in
which the correlations among the samples from different modalities can be effectively …

On metric learning for audio-text cross-modal retrieval

X Mei, X Liu, J Sun, MD Plumbley, W Wang - arXiv preprint arXiv …, 2022 - arxiv.org
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates
given a query in another modality. Solving such cross-modal retrieval task is challenging …

Mutual quantization for cross-modal search with noisy labels

E Yang, D Yao, T Liu, C Deng - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Deep cross-modal hashing has become an essential tool for supervised multimodal search.
These models tend to be optimized with large, curated multimodal datasets, where most …

Polysemous visual-semantic embedding for cross-modal retrieval

Y Song, M Soleymani - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Visual-semantic embedding aims to find a shared latent space where related visual and
textual instances are close to each other. Most current methods learn injective embedding …

Vop: Text-video co-operative prompt tuning for cross-modal retrieval

S Huang, B Gong, Y Pan, J Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by
tuning the backbone with additional heavy modules, which not only brings huge …