MUCH: mutual coupling enhancement of scene recognition and dense captioning

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

MUCH: mutual coupling enhancement of scene recognition and dense captioning

在引用文章中搜索

[PDF] arxiv.org

A comprehensive survey of 3d dense captioning: Localizing and describing objects in 3d scenes

T Yu, X Lin, S Wang, W Sheng… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that
aims to generate multiple detailed and accurate descriptions for 3D scenes. It presents …

被引用次数：3 相关文章所有 4 个版本

A comprehensive survey on deep-learning-based visual captioning

B Xin, N Xu, Y Zhai, T Zhang, Z Lu, J Liu, W Nie, X Li… - Multimedia …, 2023 - Springer

Generating a description for an image/video is termed as the visual captioning task. It
requires the model to capture the semantic information of visual content and translate them …

Generalized zero-shot learning with multi-source semantic embeddings for scene recognition

X Song, H Zeng, S Zhang, L Herranz… - Proceedings of the 28th …, 2020 - dl.acm.org

Recognizing visual categories from semantic descriptions is a promising way to extend the
capability of a visual classifier beyond the concepts represented in the training data (ie seen …

被引用次数：11 相关文章所有 3 个版本

Be specific, be clear: Bridging machine and human captions by scene-guided transformer

Y Huang, Z Zeng, Y Lu - Proceedings of the 2021 Workshop on Multi …, 2021 - dl.acm.org

Automatically generating natural language descriptions for images, ie, image captioning, is
one of the primary goals for multimedia understanding. The recent success of deep neural …

被引用次数：6 相关文章

高级搜索

QQ 群

MUCH: mutual coupling enhancement of scene recognition and dense captioning

A comprehensive survey of 3d dense captioning: Localizing and describing objects in 3d scenes

A comprehensive survey on deep-learning-based visual captioning

Generalized zero-shot learning with multi-source semantic embeddings for scene recognition

Be specific, be clear: Bridging machine and human captions by scene-guided transformer

引用