[HTML][HTML] When CLIP meets cross-modal hashing retrieval: A new strong baseline

X Xia, G Dong, F Li, L Zhu, X Ying - Information Fusion, 2023 - Elsevier
Recent days witness significant progress in various multi-modal tasks made by Contrastive
Language-Image Pre-training (CLIP), a multi-modal large-scale model that learns visual …

Unifying knowledge iterative dissemination and relational reconstruction network for image–text matching

X Xie, Z Li, Z Tang, D Yao, H Ma - Information Processing & Management, 2023 - Elsevier
Image–text matching is a crucial branch in multimedia retrieval which relies on learning inter-
modal correspondences. Most existing methods focus on global or local correspondence …

Rare-aware attention network for image–text matching

Y Wang, Y Su, W Li, Z Sun, Z Wei, J Nie, X Li… - Information Processing & …, 2023 - Elsevier
Image and text matching bridges visual and textual modality differences and plays a
considerable role in cross-modal retrieval. Much progress has been achieved through …

Cross-modal image–text search via efficient discrete class alignment hashing

S Wang, H Zhao, Y Wang, J Huang, K Li - Information Processing & …, 2022 - Elsevier
Hashing has produced enormous potentials in cross-modal image–text search, which learns
compact binary codes by exploring the correlations between distinct modalities. However …

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier
Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …

Enhanced deep discrete hashing with semantic-visual similarity for image retrieval

Z Yang, L Yang, W Huang, L Sun, J Long - Information Processing & …, 2021 - Elsevier
Hashing has been shown to be successful in a number of Approximate Nearest Neighbor
(ANN) domains, ranging from medicine, computer vision to information retrieval. However …

Learning double-level relationship networks for image captioning

C Wang, X Gu - Information Processing & Management, 2023 - Elsevier
Image captioning aims to generate descriptive sentences to describe image main contents.
Existing attention-based approaches mainly focus on the salient visual features in the …

Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face Recognition

W Hu, Y Yang, H Hu - IEEE Transactions on Image Processing, 2024 - ieeexplore.ieee.org
Remarkable success of the existing Near-InfraRed and VISible (NIR-VIS) approaches owes
to sufficient labeled training data. However, collecting and tagging data from different …

Efficient discrete cross-modal hashing with semantic correlations and similarity preserving

F Yang, Q Zhang, F Ma, X Ding, Y Liu, D Tong - Information Sciences, 2023 - Elsevier
With its merits in query speed and memory footprint, hashing has elicited considerable
monument in cross-media similarity retrieval applications. Many label-dependent supervised …

Attention-guided semantic hashing for unsupervised cross-modal retrieval

X Shen, H Zhang, L Li, L Liu - 2021 IEEE international …, 2021 - ieeexplore.ieee.org
Recently, due to the low storage consumption and high search efficiency of hashing
methods and the powerful feature extraction capability of deep neural networks, deep cross …