A survey of methods, datasets and evaluation metrics for visual question answering

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：7 相关文章所有 2 个版本

CLVIN: Complete language-vision interaction network for visual question answering

C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier

The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, helping machines better understand …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Reliable visual question answering: Abstain rather than answer incorrectly

S Whitehead, S Petryk, V Shakib, J Gonzalez… - … on Computer Vision, 2022 - Springer

Abstract Machine learning has advanced dramatically, narrowing the accuracy gap to
humans in multimodal tasks like visual question answering (VQA). However, while humans …

被引用次数：37 相关文章所有 6 个版本

A Bayesian theory of mind approach to modeling cooperation and communication

S Stacy, S Gong, A Parab, M Zhao… - Wiley Interdisciplinary …, 2024 - Wiley Online Library

Abstract Language has been widely acknowledged as the benchmark of intelligence.
However, evidence from cognitive science shows that intelligent behaviors in robust social …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

Excalibur: Encouraging and evaluating embodied exploration

H Zhu, R Kapoor, SY Min, W Han, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Experience precedes understanding. Humans constantly explore and learn about their
environment out of curiosity, gather information, and update their models of the world. On the …

被引用次数：10 相关文章所有 4 个版本

[PDF] springer.com

An optimized capsule neural networks for tomato leaf disease classification

LM Abouelmagd, MY Shams, HS Marie… - EURASIP Journal on …, 2024 - Springer

Plant diseases have a significant impact on leaves, with each disease exhibiting specific
spots characterized by unique colors and locations. Therefore, it is crucial to develop a …

被引用次数：11 相关文章所有 7 个版本

Multilevel attention and relation network based image captioning model

H Sharma, S Srivastava - Multimedia Tools and Applications, 2023 - Springer

The aim of the image captioning task is to understand various semantic concepts such as
objects and their relationships in an image and combine them to generate a natural …

被引用次数：12 相关文章所有 4 个版本

Dual-decoder transformer network for answer grounding in visual question answering

L Zhu, L Peng, W Zhou, J Yang - Pattern Recognition Letters, 2023 - Elsevier

Abstract Visual Question Answering (VQA) have made stunning advances by exploiting
Transformer architecture and large-scale visual-linguistic pretraining. State-of-the-art …

被引用次数：6 相关文章所有 3 个版本

Improving visual question answering by combining scene-text information

H Sharma, AS Jalal - Multimedia Tools and Applications, 2022 - Springer

The text present in natural scenes contains semantic information about its surrounding
environment. For example, the majority of questions asked by blind people related to images …

被引用次数：13 相关文章所有 4 个版本

[PDF] uqo.ca

Visual Concrete Bridge Defect Classification and Detection Using Deep Learning: A Systematic Review

D Amirkhani, MS Allili, L Hebbache… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Visual inspection is an important process for maintaining bridges in road transportation
systems, and preventing catastrophic events and tragedies. In this process, accurate and …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群