Image captioning: a comprehensive survey

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：298 相关文章所有 11 个版本

[PDF] ieee.org

A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org

Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

被引用次数：147 相关文章所有 5 个版本

[PDF] whiterose.ac.uk

Textual context-aware dense captioning with diverse words

Z Shao, J Han, K Debattista… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Dense captioning generates more detailed spoken descriptions for complex visual scenes.
Despite several promising leads, existing methods still have two broad limitations: 1) The …

被引用次数：43 相关文章所有 3 个版本

[PDF] thecvf.com

Segment and caption anything

X Huang, J Wang, Y Tang, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability
to generate regional captions. SAM presents strong generalizability to segment anything …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Neural attention for image captioning: review of outstanding methods

Z Zohourianshahzadi, JK Kalita - Artificial Intelligence Review, 2022 - Springer

Image captioning is the task of automatically generating sentences that describe an input
image in the best way possible. The most successful techniques for automatically generating …

被引用次数：39 相关文章所有 7 个版本

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Plip: Language-image pre-training for person representation learning

J Zuo, C Yu, N Sang, C Gao - arXiv preprint arXiv:2305.08386, 2023 - arxiv.org

Pre-training has emerged as an effective technique for learning powerful person
representations. Most existing methods have shown that pre-training on pure-vision large …

被引用次数：17 相关文章所有 2 个版本

[PDF] whiterose.ac.uk

DCMSTRD: end-to-end dense captioning via multi-scale transformer decoding

Z Shao, J Han, K Debattista… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual
scenes. While promising results have been obtained, several issues persist. In particular: 1) …

被引用次数：10 相关文章所有 2 个版本

[PDF] mdpi.com

IoT monitoring and prediction modeling of honeybee activity with alarm

N Andrijević, V Urošević, B Arsić, D Herceg, B Savić - Electronics, 2022 - mdpi.com

A significant number of recent scientific papers have raised awareness of changes in the
biological world of bees, problems with their extinction, and, as a consequence, their impact …

被引用次数：28 相关文章所有 6 个版本

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer

Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …

被引用次数：31 相关文章所有 4 个版本

高级搜索

QQ 群