Cross-modal retrieval: a systematic review of methods and future directions

L Zhu, T Wang, F Li, J Li, Z Zhang, HT Shen - arXiv preprint arXiv …, 2023 - arxiv.org
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval
methods struggle to meet the needs of users demanding access to data from various …

Cross-modal active complementary learning with self-refining correspondence

Y Qin, Y Sun, D Peng, JT Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …

Noisy-correspondence learning for text-to-image person re-identification

Y Qin, Y Chen, D Peng, X Peng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal
community which aims to retrieve the target person based on a textual query. Although …

Robust object re-identification with coupled noisy labels

M Yang, Z Huang, X Peng - International Journal of Computer Vision, 2024 - Springer
In this paper, we reveal and study a new challenging problem faced by object Re-
IDentification (ReID), ie, Coupled Noisy Labels (CNL) which refers to the Noisy Annotation …

Senet: spatial information enhancement for semantic segmentation neural networks

Y Huang, P Shi, H He, H He, B Zhao - The Visual Computer, 2024 - Springer
Image semantic segmentation is a basic task of computer vision, and plays an important role
in automatic driving, robot navigation and many other fields. However, the expensive …

Semantic-aware Contrastive Learning with Proposal Suppression for Video Semantic Role Grounding

M Liu, D Zhou, J Guo, X Luo, Z Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video semantic role grounding has gained substantial interest from both the academic and
industrial communities. While existing methods have demonstrated considerable …

Semi-supervised semi-paired cross-modal hashing

X Zhang, X Liu, X Nie, X Kang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Large-scale cross-modal hashing has drawn extensive attention due to its attractive
efficiency in both storage and retrieval. Existing methods exhibit poor performance when …

EDMC: efficient multi-view clustering via cluster and instance space learning

Y Qin, N Pu, H Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Multi-view subspace clustering aims to cluster the data lying in a union of subspaces with
low dimensions. The commonly used spectral clustering performs the final clustering based …

Dual-view curricular optimal transport for cross-lingual cross-modal retrieval

Y Wang, S Wang, H Luo, J Dong… - … on Image Processing, 2024 - ieeexplore.ieee.org
Current research on cross-modal retrieval is mostly English-oriented, as the availability of a
large number of English-oriented human-labeled vision-language corpora. In order to break …

Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval

F Zhang, XS Hua, C Chen… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper studies the problem of semi-supervised 2D-3D retrieval which aims to align both
labeled and unlabeled 2D and 3D data into the same embedding space. The problem is …