From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

CLVIN: Complete language-vision interaction network for visual question answering

C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier
The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, helping machines better understand …

Reliable visual question answering: Abstain rather than answer incorrectly

S Whitehead, S Petryk, V Shakib, J Gonzalez… - … on Computer Vision, 2022 - Springer
Abstract Machine learning has advanced dramatically, narrowing the accuracy gap to
humans in multimodal tasks like visual question answering (VQA). However, while humans …

A Bayesian theory of mind approach to modeling cooperation and communication

S Stacy, S Gong, A Parab, M Zhao… - Wiley Interdisciplinary …, 2024 - Wiley Online Library
Abstract Language has been widely acknowledged as the benchmark of intelligence.
However, evidence from cognitive science shows that intelligent behaviors in robust social …

Excalibur: Encouraging and evaluating embodied exploration

H Zhu, R Kapoor, SY Min, W Han, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Experience precedes understanding. Humans constantly explore and learn about their
environment out of curiosity, gather information, and update their models of the world. On the …

An optimized capsule neural networks for tomato leaf disease classification

LM Abouelmagd, MY Shams, HS Marie… - EURASIP Journal on …, 2024 - Springer
Plant diseases have a significant impact on leaves, with each disease exhibiting specific
spots characterized by unique colors and locations. Therefore, it is crucial to develop a …

Multilevel attention and relation network based image captioning model

H Sharma, S Srivastava - Multimedia Tools and Applications, 2023 - Springer
The aim of the image captioning task is to understand various semantic concepts such as
objects and their relationships in an image and combine them to generate a natural …

Dual-decoder transformer network for answer grounding in visual question answering

L Zhu, L Peng, W Zhou, J Yang - Pattern Recognition Letters, 2023 - Elsevier
Abstract Visual Question Answering (VQA) have made stunning advances by exploiting
Transformer architecture and large-scale visual-linguistic pretraining. State-of-the-art …

Improving visual question answering by combining scene-text information

H Sharma, AS Jalal - Multimedia Tools and Applications, 2022 - Springer
The text present in natural scenes contains semantic information about its surrounding
environment. For example, the majority of questions asked by blind people related to images …

Visual Concrete Bridge Defect Classification and Detection Using Deep Learning: A Systematic Review

D Amirkhani, MS Allili, L Hebbache… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Visual inspection is an important process for maintaining bridges in road transportation
systems, and preventing catastrophic events and tragedies. In this process, accurate and …