Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval

X Xia, G Dong, F Li, L Zhu, X Ying - Information Fusion, 2023 - Elsevier

Recent days witness significant progress in various multi-modal tasks made by Contrastive
Language-Image Pre-training (CLIP), a multi-modal large-scale model that learns visual …

被引用次数：10 相关文章所有 3 个版本

Unifying knowledge iterative dissemination and relational reconstruction network for image–text matching

X Xie, Z Li, Z Tang, D Yao, H Ma - Information Processing & Management, 2023 - Elsevier

Image–text matching is a crucial branch in multimedia retrieval which relies on learning inter-
modal correspondences. Most existing methods focus on global or local correspondence …

被引用次数：20 相关文章所有 2 个版本

Rare-aware attention network for image–text matching

Y Wang, Y Su, W Li, Z Sun, Z Wei, J Nie, X Li… - Information Processing & …, 2023 - Elsevier

Image and text matching bridges visual and textual modality differences and plays a
considerable role in cross-modal retrieval. Much progress has been achieved through …

被引用次数：11 相关文章所有 2 个版本

Cross-modal image–text search via efficient discrete class alignment hashing

S Wang, H Zhao, Y Wang, J Huang, K Li - Information Processing & …, 2022 - Elsevier

Hashing has produced enormous potentials in cross-modal image–text search, which learns
compact binary codes by exploring the correlations between distinct modalities. However …

被引用次数：14 相关文章所有 2 个版本

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier

Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …

被引用次数：9 相关文章所有 2 个版本

Enhanced deep discrete hashing with semantic-visual similarity for image retrieval

Z Yang, L Yang, W Huang, L Sun, J Long - Information Processing & …, 2021 - Elsevier

Hashing has been shown to be successful in a number of Approximate Nearest Neighbor
(ANN) domains, ranging from medicine, computer vision to information retrieval. However …

被引用次数：18 相关文章所有 2 个版本

Learning double-level relationship networks for image captioning

C Wang, X Gu - Information Processing & Management, 2023 - Elsevier

Image captioning aims to generate descriptive sentences to describe image main contents.
Existing attention-based approaches mainly focus on the salient visual features in the …

被引用次数：4 相关文章所有 2 个版本

Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face Recognition

W Hu, Y Yang, H Hu - IEEE Transactions on Image Processing, 2024 - ieeexplore.ieee.org

Remarkable success of the existing Near-InfraRed and VISible (NIR-VIS) approaches owes
to sufficient labeled training data. However, collecting and tagging data from different …

被引用次数：2 相关文章所有 5 个版本

Efficient discrete cross-modal hashing with semantic correlations and similarity preserving

F Yang, Q Zhang, F Ma, X Ding, Y Liu, D Tong - Information Sciences, 2023 - Elsevier

With its merits in query speed and memory footprint, hashing has elicited considerable
monument in cross-media similarity retrieval applications. Many label-dependent supervised …

被引用次数：3 相关文章所有 2 个版本

Attention-guided semantic hashing for unsupervised cross-modal retrieval

X Shen, H Zhang, L Li, L Liu - 2021 IEEE international …, 2021 - ieeexplore.ieee.org

Recently, due to the low storage consumption and high search efficiency of hashing
methods and the powerful feature extraction capability of deep neural networks, deep cross …

被引用次数：11 相关文章所有 3 个版本

高级搜索

QQ 群