Self-supervised correlation learning for cross-modal retrieval

Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification

F Peng, X Yang, L Xiao, Y Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Although significant progress has been made in few-shot learning, most of existing few-shot
image classification methods require supervised pre-training on a large amount of samples …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

Multimodal pre-training based on graph attention network for document understanding

Z Zhang, J Ma, J Du, L Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Document intelligence as a relatively new research topic supports many business
applications. Its main task is to automatically read, understand, and analyze documents …

被引用次数：27 相关文章所有 4 个版本

[PDF] thecvf.com

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Z Zhao, M Chen, T Dai, J Yao, B Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Noisy correspondence that refers to mismatches in cross-modal data pairs is prevalent on
human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly …

被引用次数：1 相关文章所有 3 个版本

Image retrieval using unsupervised prompt learning and regional attention

BJ Zhang, GH Liu, Z Li - Expert Systems with Applications, 2024 - Elsevier

Identifying the target object in an image can produce more accurate and discriminative
feature representations, which can significantly improve large-scale instance-level image …

被引用次数：2 相关文章

[PDF] mdpi.com

A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective

S Wang, L Zhu, L Shi, H Mo, S Tan - Applied Sciences, 2023 - mdpi.com

Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and
advance the field. Although previous reviews have primarily focused on binary and real …

被引用次数：3 相关文章所有 2 个版本

[PDF] github.io

Deep supervised multi-view learning with graph priors

P Hu, L Zhen, X Peng, H Zhu, J Lin… - … on Image Processing, 2023 - ieeexplore.ieee.org

This paper presents a novel method for supervised multi-view representation learning,
which projects multiple views into a latent common space while preserving the …

被引用次数：3 相关文章所有 7 个版本

[PDF] researchgate.net

Deep multimodal learning for information retrieval

W Ji, Y Wei, Z Zheng, H Fei, T Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Information retrieval (IR) is a fundamental technique that aims to acquire information from a
collection of documents, web pages, or other sources. While traditional text-based IR has …

被引用次数：5 相关文章所有 3 个版本

Conditional Video Diffusion Network for Fine-grained Temporal Sentence Grounding

D Liu, J Zhu, X Fang, Z Xiong, H Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding (TSG) aims to locate a semantically related segment of an
untrimmed video guided by a sentence query. Since the untrimmed videos are too long …

被引用次数：1 相关文章

A semi-supervised cross-modal memory bank for cross-modal retrieval

Y Huang, B Hu, Y Zhang, C Gao, Q Wang - Neurocomputing, 2024 - Elsevier

The core of semi-supervised cross-modal retrieval tasks lies in leveraging limited supervised
information to measure the similarity between cross-modal data. Current approaches …

被引用次数：2 相关文章

Deep Neighborhood-preserving Hashing with Quadratic Spherical Mutual Information for Cross-modal Retrieval

Q Qin, Y Huo, L Huang, J Dai, H Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Driven by the high nonlinearity of deep neural networks, deep hashing has achieved the
pictured great potential in cross-modal retrieval applications, significantly bridging the …

被引用次数：1 相关文章

高级搜索

QQ 群