Visual–textual hybrid sequence matching for joint reasoning

K Wen, X Gu, Q Cheng - … on circuits and systems for video …, 2020 - ieeexplore.ieee.org

Image-Text Matching is one major task in cross-modal information processing. The main
challenge is to learn the unified visual and textual representations. Previous methods that …

被引用次数：100 相关文章所有 3 个版本

Multi-scale fine-grained alignments for image and sentence matching

W Li, Y Wang, Y Su, X Li, AA Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Image and sentence matching is a critical task to bridge the visual and textual discrepancy
due to the heterogeneous modalities. Great progress has been made by exploring the …

被引用次数：37 相关文章

[PDF] arxiv.org

Universal vision-language dense retrieval: Learning a unified representation space for multi-modal retrieval

Z Liu, C Xiong, Y Lv, Z Liu, G Yu - arXiv preprint arXiv:2209.00179, 2022 - arxiv.org

This paper presents Universal Vision-Language Dense Retrieval (UniVL-DR), which builds
a unified model for multi-modal retrieval. UniVL-DR encodes queries and multi-modality …

被引用次数：20 相关文章所有 4 个版本

Km4: Visual reasoning via knowledge embedding memory model with mutual modulation

W Zheng, L Yan, C Gou, FY Wang - Information Fusion, 2021 - Elsevier

Visual reasoning is a special kind of visual question answering, which is essentially multi-
step and compositional, and also requires intensive text-visual interaction. The most …

被引用次数：33 相关文章

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

被引用次数：12 相关文章所有 2 个版本

Efficient semi-supervised multimodal hashing with importance differentiation regression

C Zheng, L Zhu, Z Zhang, J Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Multi-modal hashing learns compact binary hash codes by collaborating heterogeneous
multi-modal features at both the model training and online retrieval stages to support large …

被引用次数：10 相关文章所有 5 个版本

Multi-view inter-modality representation with progressive fusion for image-text matching

J Wu, L Wang, C Chen, J Lu, C Wu - Neurocomputing, 2023 - Elsevier

Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …

被引用次数：6 相关文章所有 2 个版本

Robust Commonsense Reasoning Against Noisy Labels Using Adaptive Correction

X Yang, C Deng, K Wei, D Tao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Commonsense reasoning based on knowledge graphs (KGs) is a challenging task that
requires predicting complex questions over the described textual contexts and relevant …

被引用次数：1 相关文章所有 3 个版本

[PDF] emnuvens.com.br

Inteligência Híbrida ea Gestão do Conhecimento: a simbiose homem e máquina

HPV Machado, JF Calvi - Revista Gestão & …, 2023 - revistagt.fpl.emnuvens.com.br

A convergência entre a relação humana e máquina está associada ao conceito de
inteligência híbrida, que tem despertado crescente interesse no meio acadêmico. Objetivo …

被引用次数：1 相关文章

Cross-media web video topic detection based on heterogeneous interactive tensor learning

C Zhang, K Mei, X Xiao - Knowledge-Based Systems, 2024 - Elsevier

Topic detection based on text reasoning has attracted widespread attention. Existing
methods focus on inference based on textual semantic cues. However, each video is …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群