Y Guo, L Gao, X Wang, Y Hu, X Xu… - Proceedings of the …, 2021 - openaccess.thecvf.com
The scene graph generation (SGG) task aims to detect visual relationship triplets, ie, subject, predicate, object, in an image, providing a structural vision layout for scene understanding …
We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more …
A Tran, A Mathews, L Xie - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge …
Z Shi, X Zhou, X Qiu, X Zhu - arXiv preprint arXiv:2006.11807, 2020 - arxiv.org
Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a …
Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+ text systems, yet …
W Zhao, X Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains …
Abstract Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of …
Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating …
Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality …