From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

Mirrorgan: Learning text-to-image generation by redescription

T Qiao, J Zhang, D Xu, D Tao - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Generating an image from a given text description has two goals: visual realism and
semantic consistency. Although significant progress has been made in generating high …

Chinese image captioning via fuzzy attention-based DenseNet-BiLSTM

H Lu, R Yang, Z Deng, Y Zhang, G Gao… - ACM Transactions on …, 2021 - dl.acm.org
Chinese image description generation tasks usually have some challenges, such as single-
feature extraction, lack of global information, and lack of detailed description of the image …

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

CW Kuo, Z Kira - Proceedings of the IEEE/CVF conference …, 2022 - openaccess.thecvf.com
Significant progress has been made on visual captioning, largely relying on pre-trained
features and later fixed object detectors that serve as rich inputs to auto-regressive models …

Deconfounded image captioning: A causal retrospect

X Yang, H Zhang, J Cai - IEEE Transactions on Pattern …, 2021 - ieeexplore.ieee.org
Dataset bias in vision-language tasks is becoming one of the main problems which hinders
the progress of our community. Existing solutions lack a principled analysis about why …

NWPU-captions dataset and MLCA-net for remote sensing image captioning

Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …

Reasoning visual dialogs with structural and partial observations

Z Zheng, W Wang, S Qi, SC Zhu - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
We propose a novel model to address the task of Visual Dialog which exhibits complex
dialog structures. To obtain a reasonable answer based on the current question and the …

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

N Xu, H Zhang, AA Liu, W Nie, Y Su… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Image captioning is one of the most challenging tasks in AI because it requires an
understanding of both complex visuals and natural language. Because image captioning is …

Dense relational captioning: Triple-stream networks for relationship-based captioning

DJ Kim, J Choi, TH Oh… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Our goal in this work is to train an image captioning model that generates more dense and
informative captions. We introduce" relational captioning," a novel image captioning task …