Informative image captioning with external sources of information

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：88 相关文章所有 5 个版本

[PDF] thecvf.com

From general to specific: Informative scene graph generation via balance adjustment

Y Guo, L Gao, X Wang, Y Hu, X Xu… - Proceedings of the …, 2021 - openaccess.thecvf.com

The scene graph generation (SGG) task aims to detect visual relationship triplets, ie, subject,
predicate, object, in an image, providing a structural vision layout for scene understanding …

被引用次数：85 相关文章所有 5 个版本

[PDF] arxiv.org

Visual news: Benchmark and challenges in news image captioning

F Liu, Y Wang, T Wang, V Ordonez - arXiv preprint arXiv:2010.03743, 2020 - arxiv.org

We propose Visual News Captioner, an entity-aware model for the task of news image
captioning. We also introduce Visual News, a large-scale benchmark consisting of more …

被引用次数：95 相关文章所有 4 个版本

[PDF] thecvf.com

Transform and tell: Entity-aware news image captioning

A Tran, A Mathews, L Xie - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com

We propose an end-to-end model which generates captions for images embedded in news
articles. News images present two key challenges: they rely on real-world knowledge …

被引用次数：100 相关文章所有 7 个版本

[PDF] arxiv.org

Improving image captioning with better use of captions

Z Shi, X Zhou, X Qiu, X Zhu - arXiv preprint arXiv:2006.11807, 2020 - arxiv.org

Image captioning is a multimodal problem that has drawn extensive attention in both the
natural language processing and computer vision community. In this paper, we present a …

被引用次数：91 相关文章所有 3 个版本

[PDF] arxiv.org

Underspecification in scene description-to-depiction tasks

B Hutchinson, J Baldridge, V Prabhakaran - arXiv preprint arXiv …, 2022 - arxiv.org

Questions regarding implicitness, ambiguity and underspecification are crucial for
understanding the task validity and ethical concerns of multimodal image+ text systems, yet …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Boosting entity-aware image captioning with multi-modal knowledge graph

W Zhao, X Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

Entity-aware image captioning aims to describe named entities and events related to the
image by utilizing the background knowledge in the associated article. This task remains …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

A unified framework for slot based response generation in a multimodal dialogue system

M Firdaus, A Madasu, A Ekbal - Multimedia Tools and Applications, 2024 - Springer

Abstract Natural Language Understanding (NLU) and Natural Language Generation (NLG)
are the two critical components of every conversational system that handles the task of …

被引用次数：5 相关文章所有 4 个版本

[PDF] aaai.org

Reinforcing an image caption generator using off-line human feedback

PH Seo, P Sharma, T Levinboim, B Han… - Proceedings of the AAAI …, 2020 - aaai.org

Human ratings are currently the most accurate way to assess the quality of an image
captioning model, yet most often the only used outcome of an expensive human rating …

被引用次数：26 相关文章所有 9 个版本

[PDF] arxiv.org

Quality estimation for image captions based on large-scale human evaluations

T Levinboim, AV Thapliyal, P Sharma… - arXiv preprint arXiv …, 2019 - arxiv.org

Automatic image captioning has improved significantly over the last few years, but the
problem is far from being solved, with state of the art models still often producing low quality …

被引用次数：26 相关文章所有 5 个版本

高级搜索

QQ 群