Robust visual question answering: Datasets, methods, and future challenges

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Robust visual question answering: Datasets, methods, and future challenges

在引用文章中搜索

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey

Q Lin, Y Zhu, X Mei, L Huang, J Ma, K He… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of artificial intelligence has constantly reshaped the field of
intelligent healthcare and medicine. As a vital technology, multimodal learning has …

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

J Ma, M Hu, P Wang, W Sun, L Song, H Pei… - arXiv preprint arXiv …, 2024 - arxiv.org

Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task,
demanding intelligent systems to accurately respond to natural language queries based on …

高级搜索

QQ 群

Robust visual question answering: Datasets, methods, and future challenges

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

引用