Self-critical n-step training for image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：385 相关文章所有 11 个版本

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：107 相关文章所有 5 个版本

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

被引用次数：750 相关文章所有 2 个版本

[PDF] arxiv.org

Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts

C Guo, X Zuo, S Wang, L Cheng - European Conference on Computer …, 2022 - Springer

Inspired by the strong ties between vision and language, the two intimate human sensing
and communication modalities, our paper aims to explore the generation of 3D human full …

被引用次数：202 相关文章所有 8 个版本

[PDF] arxiv.org

Multimodal transformer with multi-view visual representation for image captioning

J Yu, J Li, Z Yu, Q Huang - … on circuits and systems for video …, 2019 - ieeexplore.ieee.org

Image captioning aims to automatically generate a natural language description of a given
image, and most state-of-the-art models have adopted an encoder-decoder framework. The …

被引用次数：437 相关文章所有 5 个版本

Region-aware image captioning via interaction learning

AA Liu, Y Zhai, N Xu, W Nie, W Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Image captioning is one of the primary goals in computer vision which aims to automatically
generate natural descriptions for images. Intuitively, human visual system can notice some …

被引用次数：116 相关文章

[PDF] thecvf.com

More photos are all you need: Semi-supervised learning for fine-grained sketch based image retrieval

AK Bhunia, PN Chowdhury, A Sain… - Proceedings of the …, 2021 - openaccess.thecvf.com

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-
SBIR) models is the data scarcity--model performances are largely bottlenecked by the lack …

被引用次数：81 相关文章所有 9 个版本

[PDF] arxiv.org

Fashion captioning: Towards generating accurate descriptions with semantic rewards

X Yang, H Zhang, D Jin, Y Liu, CH Wu, J Tan… - Computer Vision–ECCV …, 2020 - Springer

Generating accurate descriptions for online fashion items is important not only for enhancing
customers' shopping experiences, but also for the increase of online sales. Besides the …

被引用次数：85 相关文章所有 8 个版本

[PDF] port.ac.uk

Visuals to text: A comprehensive review on automatic image captioning

Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk

Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …

被引用次数：43 相关文章所有 6 个版本

[PDF] arxiv.org

Fine-grained image captioning with global-local discriminative objective

J Wu, T Chen, H Wu, Z Yang, G Luo… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Significant progress has been made in recent years in image captioning, an active topic in
the fields of vision and language. However, existing methods tend to yield overly general …

被引用次数：75 相关文章所有 5 个版本

高级搜索

QQ 群