Image captioning improved visual question answering

SR Waheed, MSM Rahim, NM Suaib… - Multimedia Tools and …, 2023 - Springer

In the computational science and engineering domains, the depiction of picture information
remains an intricate problem. Such a description needs an accurate recognition of various …

被引用次数：40 相关文章所有 4 个版本

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Surgical-vqa: Visual question answering in surgical scenes using transformer

L Seenivasan, M Islam, AK Krishna, H Ren - International Conference on …, 2022 - Springer

Visual question answering (VQA) in surgery is largely unexplored. Expert surgeons are
scarce and are often overloaded with clinical and academic workloads. This overload often …

被引用次数：25 相关文章所有 5 个版本

[PDF] arxiv.org

Surgical-vqla: Transformer with gated vision-language embedding for visual question localized-answering in robotic surgery

L Bai, M Islam, L Seenivasan… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Despite the availability of computer-aided simulators and recorded videos of surgical
procedures, junior residents still heavily rely on experts to answer their queries. However …

被引用次数：15 相关文章所有 5 个版本

See and learn more: Dense caption-aware representation for visual question answering

Y Bi, H Jiang, Y Hu, Y Sun, B Yin - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the rapid development of deep learning models, great improvements have been
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Y Wang, P Sun, Y Li, H Zhang, D Hu - arXiv preprint arXiv:2407.10947, 2024 - arxiv.org

The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual
space using audio cues. However, in this work, it is recognized that previous AVS methods …

被引用次数：3 相关文章所有 2 个版本

[PDF] ieee.org

A study of convnext architectures for enhanced image captioning

L Ramos, E Casas, C Romero… - IEEE …, 2024 - ieeexplore.ieee.org

This study explores the effectiveness of the ConvNeXt model, an advanced computer vision
architecture, in the task of image captioning. We integrated ConvNeXt with a Long Short …

被引用次数：4 相关文章所有 3 个版本

Multilevel attention and relation network based image captioning model

H Sharma, S Srivastava - Multimedia Tools and Applications, 2023 - Springer

The aim of the image captioning task is to understand various semantic concepts such as
objects and their relationships in an image and combine them to generate a natural …

被引用次数：12 相关文章所有 4 个版本

Improving visual question answering by combining scene-text information

H Sharma, AS Jalal - Multimedia Tools and Applications, 2022 - Springer

The text present in natural scenes contains semantic information about its surrounding
environment. For example, the majority of questions asked by blind people related to images …

被引用次数：13 相关文章所有 4 个版本

[PDF] oulu.fi

Explainability in medical image captioning

R Beddiar, M Oussalah - Explainable Deep Learning AI, 2023 - Elsevier

Image captioning is the task of describing the content of the image using textual
representation. It has been used in many applications such as semantic tagging, image …

被引用次数：12 相关文章所有 4 个版本

高级搜索

QQ 群