A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Supervised Deep Learning Techniques for Image Description: A Systematic Review

M López-Sánchez, B Hernández-Ocaña… - Entropy, 2023 - mdpi.com
Automatic image description, also known as image captioning, aims to describe the
elements included in an image and their relationships. This task involves two research …

Uncertainty-aware image captioning

Z Fei, M Fan, L Zhu, J Huang, X Wei… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
It is well believed that the higher uncertainty in a word of the caption, the more inter-
correlated context information is required to determine it. However, current image captioning …

Simple token-level confidence improves caption correctness

S Petryk, S Whitehead, JE Gonzalez… - Proceedings of the …, 2024 - openaccess.thecvf.com
The ability to judge whether a caption correctly describes an image is a critical part of vision-
language understanding. However, state-of-the-art models often misinterpret the correctness …

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning

B Wang, Z Zhang, S Zhao, H Zhang, R Hong… - Proceedings of the 31st …, 2023 - dl.acm.org
Transformer-based approaches to image captioning have shown great success by utilizing
long-term dependency for visual embedding. However, their coarse long-term dependency …

Improving image captioning via predicting structured concepts

T Wang, W Chen, Y Tian, Y Song, Z Mao - arXiv preprint arXiv:2311.08223, 2023 - arxiv.org
Having the difficulty of solving the semantic gap between images and texts for the image
captioning task, conventional studies in this area paid some attention to treating semantic …

CAST: Cross-Modal Retrieval and Visual Conditioning for image captioning

S Cao, G An, Y Cen, Z Yang, W Lin - Pattern Recognition, 2024 - Elsevier
Image captioning requires not only accurate recognition of objects and corresponding
relationships, but also full comprehension of the scene information. However, existing …

Shifted Window Fourier Transform And Retention For Image Captioning

JC Hu, R Cavicchioli, A Capotondi - arXiv preprint arXiv:2408.13963, 2024 - arxiv.org
Image Captioning is an important Language and Vision task that finds application in a
variety of contexts, ranging from healthcare to autonomous vehicles. As many real-world …

[PDF][PDF] Journal Homepage:-www. journalijar. com

BN Brunda, MV Potdar, ML Santhosh, N Indu… - researchgate.net
Translation is necessary for the spreading new information, knowledge and ideas across the
world. Communication is very important to communicate between two countries or between …