Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com
… Their applicability to multi-modal contexts like image captioning, however, is still largely …
Transformer with Memory for Image Captioning. The architecture improves both the image

Image captioning through image transformer

S He, W Liao, HR Tavakoli, M Yang… - Proceedings of the …, 2020 - openaccess.thecvf.com
… for the machine translation, where each transformer layer contains a single (… image
transformer for image captioning, where each transformer layer implements multiple sub-transformers

Entangled transformer for image captioning

G Li, L Zhu, P Liu, Y Yang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
… In image captioning, the typical attention mechanisms are … are the dominating architectures
in image captioning. However, … In this paper, we investigate a Transformer-based sequence …

[PDF][PDF] S2 Transformer for Image Captioning.

P Zeng, H Zhang, J Song, L Gao - IJCAI, 2022 - ijcai.org
… and efficiently incorporate grid features with transformer-based architecture for image
captioning. To achieve this target, we propose a S2 Transformer—a simple yet effective approach …

Dual global enhanced transformer for image captioning

T Xian, Z Li, C Zhang, H Ma - Neural Networks, 2022 - Elsevier
… visual representation, which is essential to improve captioning performance. In GED, we first
… Hence, we regard the existing captions generated by the classical image captioning model …

Reformer: The relational transformer for image captioning

X Yang, Y Liu, X Wang - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
… of image captioning, we propose a novel architecture ReFormer- a RElational transFORMER
to … in the image. ReFormer incorporates the objective of scene graph generation with that of …

Dual-level collaborative transformer for image captioning

Y Luo, J Ji, X Sun, L Cao, Y Wu, F Huang… - Proceedings of the …, 2021 - ojs.aaai.org
… Dual-Level Collaborative Transformer In this section, we introduce a novel image captioning
model, named Dual-Level Collaborative Transformer, which uses both grid and region …

Cptr: Full transformer network for image captioning

W Liu, S Chen, L Guo, X Zhu, J Liu - arXiv preprint arXiv:2101.10804, 2021 - arxiv.org
… In this paper, we consider the image captioning task from a new sequence-to-sequence …
TransformeR (CPTR) which takes the sequentialized raw images as the input to Transformer. …

Explaining transformer-based image captioning models: An empirical analysis

M Cornia, L Baraldi, R Cucchiara - AI Communications, 2022 - content.iospress.com
Image Captioning is the task of translating an input image into a textual description. As
such… In this work, we focus on Transformer-based image captioning models and provide …

Semi-autoregressive transformer for image captioning

Y Zhou, Y Zhang, Z Hu, M Wang - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
captioning models. We evaluate SATIC model on the challenging MSCOCO [3] image
captioning … We present three examples of generated image captions in Figure 3. From the top …