Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Exploiting the Social-Like Prior in Transformer for Visual Reasoning

Y Han, Y Hu, X Song, H Tang, M Xu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Benefiting from instrumental global dependency modeling of self-attention (SA), transformer-
based approaches have become the pivotal choices for numerous downstream visual …

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

X Dong, Z Feng, C Zhou, X Yu, M Yang… - Proceedings of the 47th …, 2024 - dl.acm.org
We present a Recipe for Effective and Efficient zero-shot video-text Retrieval, dubbed M2-
RAAP. Upon popular image-text models like CLIP, most current adaptation-based video-text …

Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning

W Zheng, L Yan, FY Wang - Artificial Intelligence, 2024 - Elsevier
Abstract Knowledge-based visual reasoning requires the ability to associate outside
knowledge that is not present in a given image for cross-modal visual understanding. Two …

HKFNet: Fine-Grained External Knowledge Fusion for Fact-Based Visual Question Answering

B Li, Y Sun, X Chen, L Xiangfeng - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Fact-based Visual Question Answering (F-VQA) aims to answer questions with observed
images and external facts. The existing deep learning-based F-VQA methods still struggle to …

[PDF][PDF] A Review of Recent Advances in Visual Question Answering: Capsule Networks and Vision Transformers in Focus

MS Prakash, SN Devananda - Indian Journal …, 2023 - sciresol.s3.us-east-2.amazonaws …
Objectives: Multimodal deep learning, incorporating images, text, videos, speech, and
acoustic signals, has grown significantly. This article aims to explore the untapped …

MSAM: Deep Semantic Interaction Network for Visual Question Answering

F Wang, B Wang, F Xu, J Li, P Liu - International Conference on …, 2023 - Springer
Abstract In Visual Question Answering (VQA) task, extracting semantic information from
multimodalities and effectively utilizing this information for interaction is crucial. Existing VQA …

Adaptive loose optimization for robust question answering

J Ma, P Wang, Z Wang, D Kong, M Hu, T Han… - arXiv preprint arXiv …, 2023 - arxiv.org
Question answering methods are well-known for leveraging data bias, such as the language
prior in visual question answering and the position bias in machine reading comprehension …

An Uncertainty-Aware Transfer Learning-Based Framework for COVID-19v Diagnosis

S Mishra - 2023 2nd International Conference on Futuristic …, 2023 - ieeexplore.ieee.org
More and more studies are looking at the use of dynamic off-chain transportation in IoT
systems that are based on payment channel networks (PCNs). Flexible routing in PCN …