Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification

F Peng, X Yang, L Xiao, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Although significant progress has been made in few-shot learning, most of existing few-shot
image classification methods require supervised pre-training on a large amount of samples …

Multimodal pre-training based on graph attention network for document understanding

Z Zhang, J Ma, J Du, L Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Document intelligence as a relatively new research topic supports many business
applications. Its main task is to automatically read, understand, and analyze documents …

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Z Zhao, M Chen, T Dai, J Yao, B Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Noisy correspondence that refers to mismatches in cross-modal data pairs is prevalent on
human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly …

Image retrieval using unsupervised prompt learning and regional attention

BJ Zhang, GH Liu, Z Li - Expert Systems with Applications, 2024 - Elsevier
Identifying the target object in an image can produce more accurate and discriminative
feature representations, which can significantly improve large-scale instance-level image …

A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

S Wang, L Zhu, L Shi, H Mo, S Tan - Applied Sciences, 2023 - mdpi.com
Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and
advance the field. Although previous reviews have primarily focused on binary and real …

Deep supervised multi-view learning with graph priors

P Hu, L Zhen, X Peng, H Zhu, J Lin… - … on Image Processing, 2023 - ieeexplore.ieee.org
This paper presents a novel method for supervised multi-view representation learning,
which projects multiple views into a latent common space while preserving the …

Deep multimodal learning for information retrieval

W Ji, Y Wei, Z Zheng, H Fei, T Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Information retrieval (IR) is a fundamental technique that aims to acquire information from a
collection of documents, web pages, or other sources. While traditional text-based IR has …

Conditional Video Diffusion Network for Fine-grained Temporal Sentence Grounding

D Liu, J Zhu, X Fang, Z Xiong, H Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding (TSG) aims to locate a semantically related segment of an
untrimmed video guided by a sentence query. Since the untrimmed videos are too long …

A semi-supervised cross-modal memory bank for cross-modal retrieval

Y Huang, B Hu, Y Zhang, C Gao, Q Wang - Neurocomputing, 2024 - Elsevier
The core of semi-supervised cross-modal retrieval tasks lies in leveraging limited supervised
information to measure the similarity between cross-modal data. Current approaches …

Deep Neighborhood-preserving Hashing with Quadratic Spherical Mutual Information for Cross-modal Retrieval

Q Qin, Y Huo, L Huang, J Dai, H Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Driven by the high nonlinearity of deep neural networks, deep hashing has achieved the
pictured great potential in cross-modal retrieval applications, significantly bridging the …