Memorize, associate and match: Embedding enhancement via fine-grained alignment for image-text retrieval

J Li, L Liu, L Niu, L Zhang - IEEE Transactions on Image …, 2021 - ieeexplore.ieee.org
Image-text retrieval aims to capture the semantic correlation between images and texts.
Existing image-text retrieval methods can be roughly categorized into embedding learning …

The image data and backbone in weakly supervised fine-grained visual categorization: A revisit and further thinking

S Ye, Y Wang, Q Peng, X You… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Weakly-supervised fine-grained visual categorization (FGVC) aims to achieve subclass
classification within the same large class using only label information. Compared to general …

Efficient semi-supervised multimodal hashing with importance differentiation regression

C Zheng, L Zhu, Z Zhang, J Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Multi-modal hashing learns compact binary hash codes by collaborating heterogeneous
multi-modal features at both the model training and online retrieval stages to support large …

Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning

Y Yang, J Guo, G Li, L Li, W Li, J Yang - Frontiers of Computer Science, 2024 - Springer
Traditional image-sentence cross-modal retrieval methods usually aim to learn consistent
representations of heterogeneous modalities, thereby to search similar instances in one …

Multi-view inter-modality representation with progressive fusion for image-text matching

J Wu, L Wang, C Chen, J Lu, C Wu - Neurocomputing, 2023 - Elsevier
Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …

Learning to disentangle and fuse for fine-grained multi-modality ship image retrieval

W Xiong, Z Xiong, P Xu, Y Cui, H Li, L Huang… - … Applications of Artificial …, 2024 - Elsevier
Multi-modality ship image retrieval aims to retrieve ship images from a large dataset,
encompassing various modalities, when provided with a query ship image. One of the key …

Adaptive Adversarial Learning based cross-modal retrieval

Z Li, H Lu, H Fu, Z Wang, G Gu - Engineering Applications of Artificial …, 2023 - Elsevier
There exists a heterogeneity gap between multi-modal data, hence it is difficult to directly
measure the similarity between them. A common way to solve the problem is representation …

Meta label associated loss for fine-grained visual recognition

Y Li, F Xiao, H Li, Q Li, S Yu - Science China Information Sciences, 2024 - Springer
Recently, intensive attempts have been made to design robust models for fine-grained
visual recognition, most notably are the impressive gains for training with noisy labels by …

Multi-granularity Feature Interaction and Multi-region Selection based Triplet Visual Question Answering

H Liu, B Wang, Y Sun, J Gao, X Li… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Accurately locating the question-related regions in one given image is crucial for visual
question answering (VQA). The current approaches suffer two limitations:(1) Dividing one …