Learning dual semantic relations with graph attention for image-text matching

K Wen, X Gu, Q Cheng - … on circuits and systems for video …, 2020 - ieeexplore.ieee.org
Image-Text Matching is one major task in cross-modal information processing. The main
challenge is to learn the unified visual and textual representations. Previous methods that …

Multi-scale fine-grained alignments for image and sentence matching

W Li, Y Wang, Y Su, X Li, AA Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Image and sentence matching is a critical task to bridge the visual and textual discrepancy
due to the heterogeneous modalities. Great progress has been made by exploring the …

Universal vision-language dense retrieval: Learning a unified representation space for multi-modal retrieval

Z Liu, C Xiong, Y Lv, Z Liu, G Yu - arXiv preprint arXiv:2209.00179, 2022 - arxiv.org
This paper presents Universal Vision-Language Dense Retrieval (UniVL-DR), which builds
a unified model for multi-modal retrieval. UniVL-DR encodes queries and multi-modality …

Km4: Visual reasoning via knowledge embedding memory model with mutual modulation

W Zheng, L Yan, C Gou, FY Wang - Information Fusion, 2021 - Elsevier
Visual reasoning is a special kind of visual question answering, which is essentially multi-
step and compositional, and also requires intensive text-visual interaction. The most …

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

Efficient semi-supervised multimodal hashing with importance differentiation regression

C Zheng, L Zhu, Z Zhang, J Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Multi-modal hashing learns compact binary hash codes by collaborating heterogeneous
multi-modal features at both the model training and online retrieval stages to support large …

Multi-view inter-modality representation with progressive fusion for image-text matching

J Wu, L Wang, C Chen, J Lu, C Wu - Neurocomputing, 2023 - Elsevier
Recently, image-text matching has been intensively explored to bridge vision and language.
Previous methods explore an inter-modality relationship between an image-text pair from …

Robust Commonsense Reasoning Against Noisy Labels Using Adaptive Correction

X Yang, C Deng, K Wei, D Tao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Commonsense reasoning based on knowledge graphs (KGs) is a challenging task that
requires predicting complex questions over the described textual contexts and relevant …

Inteligência Híbrida ea Gestão do Conhecimento: a simbiose homem e máquina

HPV Machado, JF Calvi - Revista Gestão & …, 2023 - revistagt.fpl.emnuvens.com.br
A convergência entre a relação humana e máquina está associada ao conceito de
inteligência híbrida, que tem despertado crescente interesse no meio acadêmico. Objetivo …

Cross-media web video topic detection based on heterogeneous interactive tensor learning

C Zhang, K Mei, X Xiao - Knowledge-Based Systems, 2024 - Elsevier
Topic detection based on text reasoning has attracted widespread attention. Existing
methods focus on inference based on textual semantic cues. However, each video is …