Optimizing image captioning: The effectiveness of vision transformers and VGG networks for remote sensing

H Han, B oumarou Aboubakar, M Bhatti, BA Talpur… - Big Data Research, 2024 - Elsevier
This study presents a comprehensive evaluation of two prominent deep learning models,
Vision Transformer (ViT) and VGG16, within the domain of image captioning for remote …

A Survey on Automatic Image Captioning Approaches: Contemporary Trends and Future Perspectives

G Salgotra, P Abrol, A Selwal - Archives of Computational Methods in …, 2024 - Springer
The automatic generation of image captions is one of the complex computer vision tasks that
involve integration of object detection and natural language processing (NLP). In recent …

[HTML][HTML] Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica

KS Shakya, A Alavi, J Porteous, P Khatri, A Laddi… - Applied Sciences, 2024 - mdpi.com
In orthodontics, the manual tracing of cephalometric radiographs is a common practice,
where the Sella Turcica (ST) serves as a reference point. The radiologist often manually …

Psychological analysis of house-tree-person drawings based on multimodal large models

D Xu, S Jiang, Y Zhang, X Li - Multimedia Systems, 2025 - Springer
Automatically generating descriptions for House-Tree-Person (HTP) drawings can effectively
reduce the subjective components in clinical assessments and assist clinical psychologists …

[PDF][PDF] Recurrent Neural Networks for Image Captioning: A Case Study with LSTM

SS Mohite, C Suganthini… - Journal of …, 2024 - pdfs.semanticscholar.org
This research investigates the viability of Long Short-Term Memory (LSTM) systems, a
subtype of Recurrent Neural Networks (RNNs), for picture captioning. Leveraging the MS …

Image Captioning untuk Gambar Rambu Lalu Lintas Indonesia Menggunakan Pretrained CNN dan Transformer

NP Aprilia, TH Rochadiani - The Indonesian Journal of Computer Science, 2024 - ijcs.net
This research aims to address the lack of understanding of traffic signs in Indonesia through
the development of an image captioning model using Inception V3 and Transformer. With …