From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier
Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Deecap: Dynamic early exiting for efficient image captioning

Z Fei, X Yan, S Wang, Q Tian - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Both accuracy and efficiency are crucial for image captioning in real-world scenarios.
Although Transformer-based models have gained significant improved captioning …

Attention-aligned transformer for image captioning

Z Fei - proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Recently, attention-based image captioning models, which are expected to ground correct
image regions for proper word generations, have achieved remarkable performance …

Semi-autoregressive transformer for image captioning

Y Zhou, Y Zhang, Z Hu, M Wang - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Current state-of-the-art image captioning models adopt autoregressive decoders, ie they
generate each word by conditioning on previously generated words, which leads to heavy …

Non-autoregressive image captioning with counterfactuals-critical multi-agent learning

L Guo, J Liu, X Zhu, X He, J Jiang, H Lu - arXiv preprint arXiv:2005.04690, 2020 - arxiv.org
Most image captioning models are autoregressive, ie they generate each word by
conditioning on previously generated words, which leads to heavy latency during inference …

Uncertainty-aware image captioning

Z Fei, M Fan, L Zhu, J Huang, X Wei… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
It is well believed that the higher uncertainty in a word of the caption, the more inter-
correlated context information is required to determine it. However, current image captioning …

Diffusion-rwkv: Scaling rwkv-like architectures for diffusion models

Z Fei, M Fan, C Yu, D Li, J Huang - arXiv preprint arXiv:2404.04478, 2024 - arxiv.org
Transformers have catalyzed advancements in computer vision and natural language
processing (NLP) fields. However, substantial computational complexity poses limitations for …

Memory-augmented image captioning

Z Fei - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Current deep learning-based image captioning systems have been proven to store practical
knowledge with their parameters and achieve competitive performances in the public …