Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Dilateformer: Multi-scale dilated transformer for visual recognition

J Jiao, YM Tang, KY Lin, Y Gao, AJ Ma… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-
range dependencies between arbitrary image patches while the global attended receptive …

Image forgery detection using deep learning by recompressing images

SS Ali, II Ganapathi, NS Vu, SD Ali, N Saxena… - Electronics, 2022 - mdpi.com
Capturing images has been increasingly popular in recent years, owing to the widespread
availability of cameras. Images are essential in our daily lives because they contain a wealth …

Geometry Attention Transformer with position-aware LSTMs for image captioning

C Wang, Y Shen, L Ji - Expert systems with applications, 2022 - Elsevier
In recent years, Transformer structures have been widely applied in image captioning with
impressive performance. However, previous works often neglect the geometry and position …

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …

Semi-supervised medical report generation via graph-guided hybrid feature consistency

K Zhang, H Jiang, J Zhang, Q Huang… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Medical report generation generates the corresponding report according to the given
radiology image, which has been attracting increasing research interest. However, existing …

Visual cluster grounding for image captioning

W Jiang, M Zhu, Y Fang, G Shi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …

Image caption generation using visual attention prediction and contextual spatial relation extraction

R Sasibhooshan, S Kumaraswamy, S Sasidharan - Journal of Big Data, 2023 - Springer
Automatic caption generation with attention mechanisms aims at generating more
descriptive captions containing coarser to finer semantic contents in the image. In this work …

Transformer-based local-global guidance for image captioning

H Parvin, AR Naghsh-Nilchi, HM Mohammadi - Expert Systems with …, 2023 - Elsevier
Image captioning is a difficult problem for machine learning algorithms to compress huge
amounts of images into descriptive languages. The recurrent models are popularly used as …

Hgan: Hierarchical graph alignment network for image-text retrieval

J Guo, M Wang, Y Zhou, B Song, Y Chi… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Image-text retrieval (ITR) is a challenging task in the field of multimodal information
processing due to the semantic gap between different modalities. In recent years …