Efficient modeling of future context for image captioning

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

被引用次数：78 相关文章所有 8 个版本

[PDF] mdpi.com

Supervised Deep Learning Techniques for Image Description: A Systematic Review

M López-Sánchez, B Hernández-Ocaña… - Entropy, 2023 - mdpi.com

Automatic image description, also known as image captioning, aims to describe the
elements included in an image and their relationships. This task involves two research …

被引用次数：7 相关文章所有 9 个版本

[PDF] aaai.org

Uncertainty-aware image captioning

Z Fei, M Fan, L Zhu, J Huang, X Wei… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

It is well believed that the higher uncertainty in a word of the caption, the more inter-
correlated context information is required to determine it. However, current image captioning …

被引用次数：10 相关文章所有 4 个版本

[PDF] thecvf.com

Simple token-level confidence improves caption correctness

S Petryk, S Whitehead, JE Gonzalez… - Proceedings of the …, 2024 - openaccess.thecvf.com

The ability to judge whether a caption correctly describes an image is a critical part of vision-
language understanding. However, state-of-the-art models often misinterpret the correctness …

被引用次数：5 相关文章所有 5 个版本

[PDF] aaai.org

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

被引用次数：3 相关文章

[PDF] archive.org

CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning

B Wang, Z Zhang, S Zhao, H Zhang, R Hong… - Proceedings of the 31st …, 2023 - dl.acm.org

Transformer-based approaches to image captioning have shown great success by utilizing
long-term dependency for visual embedding. However, their coarse long-term dependency …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Improving image captioning via predicting structured concepts

T Wang, W Chen, Y Tian, Y Song, Z Mao - arXiv preprint arXiv:2311.08223, 2023 - arxiv.org

Having the difficulty of solving the semantic gap between images and texts for the image
captioning task, conventional studies in this area paid some attention to treating semantic …

被引用次数：7 相关文章所有 4 个版本

CAST: Cross-Modal Retrieval and Visual Conditioning for image captioning

S Cao, G An, Y Cen, Z Yang, W Lin - Pattern Recognition, 2024 - Elsevier

Image captioning requires not only accurate recognition of objects and corresponding
relationships, but also full comprehension of the scene information. However, existing …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Shifted Window Fourier Transform And Retention For Image Captioning

JC Hu, R Cavicchioli, A Capotondi - arXiv preprint arXiv:2408.13963, 2024 - arxiv.org

Image Captioning is an important Language and Vision task that finds application in a
variety of contexts, ranging from healthcare to autonomous vehicles. As many real-world …

[PDF] researchgate.net

[PDF][PDF] Journal Homepage:-www. journalijar. com

BN Brunda, MV Potdar, ML Santhosh, N Indu… - researchgate.net

Translation is necessary for the spreading new information, knowledge and ideas across the
world. Communication is very important to communicate between two countries or between …

高级搜索

QQ 群