Differential networks for visual question answering

W Chen, W Wang, L Liu, MS Lew - Neurocomputing, 2021 - Elsevier

The focus of this survey is on the analysis of two modalities of multimodal deep learning:
image and text. Unlike classic reviews of deep learning where monomodal image classifiers …

被引用次数：38 相关文章所有 6 个版本

[PDF] thecvf.com

An improved attention for visual question answering

T Rahman, SH Chou, L Sigal… - Proceedings of the …, 2021 - openaccess.thecvf.com

We consider the problem of Visual Question Answering (VQA). Given an image and a free-
form, open-ended, question, expressed in natural language, the goal of VQA system is to …

被引用次数：68 相关文章所有 7 个版本

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

被引用次数：46 相关文章所有 2 个版本

[PDF] arxiv.org

Visual question answering using deep learning: A survey and performance analysis

Y Srivastava, V Murali, SR Dubey… - Computer Vision and …, 2021 - Springer

Abstract The Visual Question Answering (VQA) task combines challenges for processing
data with both Visual and Linguistic processing, to answer basic 'common sense'questions …

被引用次数：71 相关文章所有 4 个版本

[PDF] arxiv.org

Dialoguetrm: Exploring the intra-and inter-modal emotional behaviors in the conversation

Y Mao, Q Sun, G Liu, X Wang, W Gao, X Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Emotion Recognition in Conversations (ERC) is essential for building empathetic human-
machine systems. Existing studies on ERC primarily focus on summarizing the context …

被引用次数：54 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis

Y Liu, Y Li, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - Meta-Radiology, 2024 - Elsevier

This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on
three representative tasks radiology report generation, medical visual question answering …

被引用次数：5 相关文章

[PDF] archive.org

Multi-concept representation learning for knowledge graph completion

J Wang, B Wang, J Gao, Y Hu, B Yin - ACM Transactions on Knowledge …, 2023 - dl.acm.org

Knowledge Graph Completion (KGC) aims at inferring missing entities or relations by
embedding them in a low-dimensional space. However, most existing KGC methods …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

Q2atransformer: Improving medical vqa via an answer querying decoder

Y Liu, Z Wang, D Xu, L Zhou - International Conference on Information …, 2023 - Springer

Abstract Medical Visual Question Answering (VQA) systems play a supporting role to
understand clinic-relevant information carried by medical images. The questions to a …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

Bilateral cross-modality graph matching attention for feature fusion in visual question answering

J Cao, X Qin, S Zhao, J Shen - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org

Answering semantically complicated questions according to an image is challenging in a
visual question answering (VQA) task. Although the image can be well represented by deep …

被引用次数：30 相关文章所有 6 个版本

Global-local cross-view fisher discrimination for view-invariant action recognition

L Gao, Y Ji, Y Yang, HT Shen - … of the 30th ACM International Conference …, 2022 - dl.acm.org

View change brings a significant challenge to action representation and recognition due to
pose occlusion and deformation. We propose a Global-Local Cross-View Fisher …

被引用次数：11 相关文章

高级搜索

QQ 群