ALSA: adversarial learning of supervised attentions for visual question answering

RY Zakari, JW Owusu, H Wang, K Qin, ZK Lawal… - arXiv preprint arXiv …, 2022 - arxiv.org

Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent
years. This achievement can be ascribed in part to advances in AI subfields including …

被引用次数：15 相关文章所有 4 个版本

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier

Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

被引用次数：63 相关文章所有 2 个版本

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

被引用次数：46 相关文章所有 2 个版本

A survey of efficient fine-tuning methods for Vision-Language Models—Prompt and Adapter

J Xing, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier

Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …

被引用次数：16 相关文章

An improved attention and hybrid optimization technique for visual question answering

H Sharma, AS Jalal - Neural Processing Letters, 2022 - Springer

Abstract In Visual Question Answering (VQA), an attention mechanism has a critical role in
specifying the different objects present in an image or tells the machine where to focus by …

被引用次数：38 相关文章所有 3 个版本

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer

Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …

被引用次数：38 相关文章所有 4 个版本

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

被引用次数：18 相关文章所有 2 个版本

Counting-based visual question answering with serial cascaded attention deep learning

T MeshuWelde, L Liao - Pattern Recognition, 2023 - Elsevier

The counting-based questions play a major part in Visual Question Answering (VQA), the
most challenging factor is counting the different objects present in the images. Recently …

被引用次数：6 相关文章所有 3 个版本

[PDF] cell.com Full View

Protein–ligand binding affinity prediction with edge awareness and supervised attention

Y Gu, X Zhang, A Xu, W Chen, K Liu, L Wu, S Mo, Y Hu… - Iscience, 2023 - cell.com

Accurate prediction of protein–ligand binding affinity is crucial in structure-based drug
design but remains some challenges even with recent advances in deep learning:(1) …

被引用次数：12 相关文章所有 8 个版本

A question-guided multi-hop reasoning graph network for visual question answering

Z Xu, J Gu, M Liu, G Zhou, H Fu, C Qiu - Information Processing & …, 2023 - Elsevier

Abstract Visual Question Answering (VQA) requires reasoning about the visually-grounded
relations in the image and question context. A crucial aspect of solving complex questions is …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群