From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

Textual context-aware dense captioning with diverse words

Z Shao, J Han, K Debattista… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Dense captioning generates more detailed spoken descriptions for complex visual scenes.
Despite several promising leads, existing methods still have two broad limitations: 1) The …

Segment and caption anything

X Huang, J Wang, Y Tang, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability
to generate regional captions. SAM presents strong generalizability to segment anything …

Neural attention for image captioning: review of outstanding methods

Z Zohourianshahzadi, JK Kalita - Artificial Intelligence Review, 2022 - Springer
Image captioning is the task of automatically generating sentences that describe an input
image in the best way possible. The most successful techniques for automatically generating …

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Plip: Language-image pre-training for person representation learning

J Zuo, C Yu, N Sang, C Gao - arXiv preprint arXiv:2305.08386, 2023 - arxiv.org
Pre-training has emerged as an effective technique for learning powerful person
representations. Most existing methods have shown that pre-training on pure-vision large …

DCMSTRD: end-to-end dense captioning via multi-scale transformer decoding

Z Shao, J Han, K Debattista… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual
scenes. While promising results have been obtained, several issues persist. In particular: 1) …

IoT monitoring and prediction modeling of honeybee activity with alarm

N Andrijević, V Urošević, B Arsić, D Herceg, B Savić - Electronics, 2022 - mdpi.com
A significant number of recent scientific papers have raised awareness of changes in the
biological world of bees, problems with their extinction, and, as a consequence, their impact …

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer
Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …