Vqa and visual reasoning: An overview of recent datasets, methods and challenges

RY Zakari, JW Owusu, H Wang, K Qin, ZK Lawal… - arXiv preprint arXiv …, 2022 - arxiv.org
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent
years. This achievement can be ascribed in part to advances in AI subfields including …

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

A survey of efficient fine-tuning methods for Vision-Language Models—Prompt and Adapter

J Xing, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier
Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …

An improved attention and hybrid optimization technique for visual question answering

H Sharma, AS Jalal - Neural Processing Letters, 2022 - Springer
Abstract In Visual Question Answering (VQA), an attention mechanism has a critical role in
specifying the different objects present in an image or tells the machine where to focus by …

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer
Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

Counting-based visual question answering with serial cascaded attention deep learning

T MeshuWelde, L Liao - Pattern Recognition, 2023 - Elsevier
The counting-based questions play a major part in Visual Question Answering (VQA), the
most challenging factor is counting the different objects present in the images. Recently …

Protein–ligand binding affinity prediction with edge awareness and supervised attention

Y Gu, X Zhang, A Xu, W Chen, K Liu, L Wu, S Mo, Y Hu… - Iscience, 2023 - cell.com
Accurate prediction of protein–ligand binding affinity is crucial in structure-based drug
design but remains some challenges even with recent advances in deep learning:(1) …

A question-guided multi-hop reasoning graph network for visual question answering

Z Xu, J Gu, M Liu, G Zhou, H Fu, C Qiu - Information Processing & …, 2023 - Elsevier
Abstract Visual Question Answering (VQA) requires reasoning about the visually-grounded
relations in the image and question context. A crucial aspect of solving complex questions is …