J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
… for imagecaptioning by utilizing pretrained languagemodels (… imagecaptioning. To our knowledge, this is the first work that … large pretrained languagemodels for imagecaptioning. …
J Gu, G Wang, J Cai, T Chen - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
… statistical languagemodeling tasks and shows competitive performance in imagecaptioning… In this work, we present an imagecaptioningmodel with language CNN to explore both …
… imagecaptioning as a way to verbalize the information in the image, where the captions are … Once the captions are generated, all the inference in our method is done using text-only …
X Hu, Z Gan, J Wang, Z Yang, Z Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
… imagecaptioning, provide a more comprehensive study on the scaling behavior via altering data and model … as predicting the next token with languagemodeling, as shown in Figure 4 …
… for imagecaptioning with multimodal neural languagemodel… with structure-content neural languagemodel (SC-NLM). … which replaces feed-forward neural languagemodel in [13]. …
… We observe that compared to the two cases where we do not use any pre-trained model or use only the pre-trained languagemodel (ie, BERT), using VLP significantly speedups the …
… a languagemodel to generate the imagecaptions. The recently proposed CLIP model contains rich … it best for vision-language perception. Our key idea is that together with a pre-trained …
… Building on these developments, we propose to incorporate external languagemodels into visual captioning frameworks to aid and improve their capabilities both for description …
N Rotstein, D Bensaïd, S Brody… - Proceedings of the …, 2024 - openaccess.thecvf.com
… To address this challenge, we leverage existing captions and explore augmenting them with … the original captions using a large languagemodel (LLM), yielding comprehensive image …