Non-autoregressive image captioning with counterfactuals-critical multi-agent learning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：396 相关文章所有 11 个版本

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：109 相关文章所有 5 个版本

[PDF] thecvf.com

Semantic-conditional diffusion networks for image captioning

J Luo, Y Li, Y Pan, T Yao, J Feng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent advances on text-to-image generation have witnessed the rise of diffusion models
which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent …

被引用次数：78 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

被引用次数：91 相关文章所有 8 个版本

[PDF] neurips.cc

UFC-BERT: Unifying multi-modal controls for conditional image synthesis

Z Zhang, J Ma, C Zhou, R Men, Z Li… - Advances in …, 2021 - proceedings.neurips.cc

Conditional image synthesis aims to create an image according to some multi-modal
guidance in the forms of textual descriptions, reference images, and image blocks to …

被引用次数：75 相关文章所有 6 个版本

[PDF] arxiv.org

Pimnet: a parallel, iterative and mimicking network for scene text recognition

Z Qiao, Y Zhou, J Wei, W Wang, Y Zhang… - Proceedings of the 29th …, 2021 - dl.acm.org

Nowadays, scene text recognition has attracted more and more attention due to its various
applications. Most state-of-the-art methods adopt an encoder-decoder framework with …

被引用次数：78 相关文章所有 3 个版本

[PDF] thecvf.com

Deecap: Dynamic early exiting for efficient image captioning

Z Fei, X Yan, S Wang, Q Tian - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Both accuracy and efficiency are crucial for image captioning in real-world scenarios.
Although Transformer-based models have gained significant improved captioning …

被引用次数：49 相关文章所有 4 个版本

[PDF] neurips.cc

Learning distinct and representative modes for image captioning

Q Chen, C Deng, Q Wu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Over the years, state-of-the-art (SoTA) image captioning methods have achieved promising
results on some evaluation metrics (eg, CIDEr). However, recent findings show that the …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

被引用次数：21 相关文章所有 3 个版本

Deep image captioning: A review of methods, trends and future challenges

L Xu, Q Tang, J Lv, B Zheng, X Zeng, W Li - Neurocomputing, 2023 - Elsevier

Image captioning, also called report generation in medical field, aims to describe visual
content of images in human language, which requires to model semantic relationship …

被引用次数：39 相关文章所有 2 个版本

高级搜索

QQ 群