Image captioning using transformer-based double attention network

F Daneshfar, A Bartani, P Lotfi - Engineering Applications of Artificial …, 2024 - Elsevier

Diffusion models are increasingly favored over traditional approaches like generative
adversarial networks (GANs) and auto-regressive transformers due to their remarkable …

被引用次数：5 相关文章所有 2 个版本

Deep hashing image retrieval based on hybrid neural network and optimized metric learning

X Xiao, S Cao, L Wang, S Cheng, E Yuan - Knowledge-Based Systems, 2024 - Elsevier

While transformers have indeed improved image retrieval accuracy in computer vision,
challenges persist, including insufficient and imbalanced feature extraction and the inability …

被引用次数：7 相关文章所有 2 个版本

Exploring refined dual visual features cross-combination for image captioning

J Hu, Z Li, Q Su, Z Tang, H Ma - Neural Networks, 2024 - Elsevier

For current image caption tasks used to encode region features and grid features
Transformer-based encoders have become commonplace, because of their multi-head self …

Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning

MB Hossen, Z Ye, A Abdussalam, SU Hassan - Engineering Applications of …, 2024 - Elsevier

Fine-grained image captioning with attribute information has garnered significant attention in
the realms of computer vision and natural language processing, demanding precise and …

相关文章所有 2 个版本

A transformer based real-time photo captioning framework for visually impaired people with visual attention

AK Muhammed Kunju, S Baskar, S Zafar… - Multimedia Tools and …, 2024 - Springer

In recent years, transformer-based photo captioning frameworks plays a crucial role in
improving individuals' overall well-being, self-reliance, and inclusivity by giving them access …

被引用次数：6 相关文章

[PDF] openreview.net

Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features

J Hu, Z Li - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

Transformer-based encoders that encode both region and grid features are the preferred
choice for the image captioning task due to their multi-head self-attention mechanism. This …

相关文章所有 2 个版本

Policy Learning-Based Image Captioning With Vision Transformer

NV Bathula, I Paleti, S Pagidi… - 2024 IEEE …, 2024 - ieeexplore.ieee.org

In light of the remarkable progress made in automated image caption generation, it is still
challenging to create captions that accurately reflect factual information and yet capture the …

[PDF] ssrn.com

Image Captioning Using Novel Multimodal Feature Fusion

S Zahra, M Iqbal, HU Rehman, AA Saleem - Adil Ali, Image Captioning … - papers.ssrn.com

Image captioning is a challenging task that involves generating natural language
descriptions of images by integrating visual and textual data. This study presents a novel …

相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] ПОБУДОВА МОДЕЛІ ОПИСУ ЗОБРАЖЕНЬ ДЛЯ ЗАДАЧІ РОЗПІЗНАВАННЯ ДОРОГОЦІННОСТЕЙ

АС Коваленко - The X International Scientific and Practical Conference … - researchgate.net

Image Captioning—це динамічна галузь, яка поєднує комп'ютерний зір і обробку
природної мови для автоматичного створення текстових описів зображень. Її головне …

相关文章所有 2 个版本

[PDF] dergipark.org.tr

Exploring Bengali Image Descriptions through the combination of diverse CNN Architectures and Transformer Decoders

B Patra, DR Kisku - Turkish Journal of Engineering, 2025 - dergipark.org.tr

In recent years, there has been growing interest among researchers in the field of image
captioning, which involves generating one or more descriptions for an image that closely …

高级搜索

QQ 群