Image captioning by diffusion models: a survey

F Daneshfar, A Bartani, P Lotfi - Engineering Applications of Artificial …, 2024 - Elsevier
Diffusion models are increasingly favored over traditional approaches like generative
adversarial networks (GANs) and auto-regressive transformers due to their remarkable …

Deep hashing image retrieval based on hybrid neural network and optimized metric learning

X Xiao, S Cao, L Wang, S Cheng, E Yuan - Knowledge-Based Systems, 2024 - Elsevier
While transformers have indeed improved image retrieval accuracy in computer vision,
challenges persist, including insufficient and imbalanced feature extraction and the inability …

Exploring refined dual visual features cross-combination for image captioning

J Hu, Z Li, Q Su, Z Tang, H Ma - Neural Networks, 2024 - Elsevier
For current image caption tasks used to encode region features and grid features
Transformer-based encoders have become commonplace, because of their multi-head self …

Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning

MB Hossen, Z Ye, A Abdussalam, SU Hassan - Engineering Applications of …, 2024 - Elsevier
Fine-grained image captioning with attribute information has garnered significant attention in
the realms of computer vision and natural language processing, demanding precise and …

A transformer based real-time photo captioning framework for visually impaired people with visual attention

AK Muhammed Kunju, S Baskar, S Zafar… - Multimedia Tools and …, 2024 - Springer
In recent years, transformer-based photo captioning frameworks plays a crucial role in
improving individuals' overall well-being, self-reliance, and inclusivity by giving them access …

Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features

J Hu, Z Li - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Transformer-based encoders that encode both region and grid features are the preferred
choice for the image captioning task due to their multi-head self-attention mechanism. This …

Policy Learning-Based Image Captioning With Vision Transformer

NV Bathula, I Paleti, S Pagidi… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
In light of the remarkable progress made in automated image caption generation, it is still
challenging to create captions that accurately reflect factual information and yet capture the …

Image Captioning Using Novel Multimodal Feature Fusion

S Zahra, M Iqbal, HU Rehman, AA Saleem - Adil Ali, Image Captioning … - papers.ssrn.com
Image captioning is a challenging task that involves generating natural language
descriptions of images by integrating visual and textual data. This study presents a novel …

[PDF][PDF] ПОБУДОВА МОДЕЛІ ОПИСУ ЗОБРАЖЕНЬ ДЛЯ ЗАДАЧІ РОЗПІЗНАВАННЯ ДОРОГОЦІННОСТЕЙ

АС Коваленко - The X International Scientific and Practical Conference … - researchgate.net
Image Captioning—це динамічна галузь, яка поєднує комп'ютерний зір і обробку
природної мови для автоматичного створення текстових описів зображень. Її головне …

Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

B Patra, DR Kisku - Turkish Journal of Engineering, 2025 - dergipark.org.tr
In recent years, there has been growing interest among researchers in the field of image
captioning, which involves generating one or more descriptions for an image that closely …