CNN deep learning-based image to vector depiction

SR Waheed, MSM Rahim, NM Suaib… - Multimedia Tools and …, 2023 - Springer
In the computational science and engineering domains, the depiction of picture information
remains an intricate problem. Such a description needs an accurate recognition of various …

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

Surgical-vqa: Visual question answering in surgical scenes using transformer

L Seenivasan, M Islam, AK Krishna, H Ren - International Conference on …, 2022 - Springer
Visual question answering (VQA) in surgery is largely unexplored. Expert surgeons are
scarce and are often overloaded with clinical and academic workloads. This overload often …

Surgical-vqla: Transformer with gated vision-language embedding for visual question localized-answering in robotic surgery

L Bai, M Islam, L Seenivasan… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Despite the availability of computer-aided simulators and recorded videos of surgical
procedures, junior residents still heavily rely on experts to answer their queries. However …

See and learn more: Dense caption-aware representation for visual question answering

Y Bi, H Jiang, Y Hu, Y Sun, B Yin - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the rapid development of deep learning models, great improvements have been
achieved in the Visual Question Answering (VQA) field. However, modern VQA models are …

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Y Wang, P Sun, Y Li, H Zhang, D Hu - arXiv preprint arXiv:2407.10947, 2024 - arxiv.org
The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual
space using audio cues. However, in this work, it is recognized that previous AVS methods …

A study of convnext architectures for enhanced image captioning

L Ramos, E Casas, C Romero… - IEEE …, 2024 - ieeexplore.ieee.org
This study explores the effectiveness of the ConvNeXt model, an advanced computer vision
architecture, in the task of image captioning. We integrated ConvNeXt with a Long Short …

Multilevel attention and relation network based image captioning model

H Sharma, S Srivastava - Multimedia Tools and Applications, 2023 - Springer
The aim of the image captioning task is to understand various semantic concepts such as
objects and their relationships in an image and combine them to generate a natural …

Improving visual question answering by combining scene-text information

H Sharma, AS Jalal - Multimedia Tools and Applications, 2022 - Springer
The text present in natural scenes contains semantic information about its surrounding
environment. For example, the majority of questions asked by blind people related to images …

Explainability in medical image captioning

R Beddiar, M Oussalah - Explainable Deep Learning AI, 2023 - Elsevier
Image captioning is the task of describing the content of the image using textual
representation. It has been used in many applications such as semantic tagging, image …