Speaking the same language: Matching machine to human captions by adversarial training

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：296 相关文章所有 11 个版本

[PDF] arxiv.org

A review on generative adversarial networks: Algorithms, theory, and applications

J Gui, Z Sun, Y Wen, D Tao, J Ye - IEEE transactions on …, 2021 - ieeexplore.ieee.org

Generative adversarial networks (GANs) have recently become a hot research topic;
however, they have been studied since 2014, and a large number of algorithms have been …

被引用次数：969 相关文章所有 13 个版本

[PDF] arxiv.org

Generative adversarial network in medical imaging: A review

X Yi, E Walia, P Babyn - Medical image analysis, 2019 - Elsevier

Generative adversarial networks have gained a lot of attention in the computer vision
community due to their capability of data generation without explicitly modelling the …

被引用次数：1658 相关文章所有 11 个版本

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

被引用次数：874 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on generative adversarial networks: Variants, applications, and training

A Jabbar, X Li, B Omar - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

The Generative Models have gained considerable attention in unsupervised learning via a
new and practical framework called Generative Adversarial Networks (GAN) due to their …

被引用次数：297 相关文章所有 3 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：159 相关文章所有 12 个版本

[PDF] arxiv.org

Object hallucination in image captioning

A Rohrbach, LA Hendricks, K Burns, T Darrell… - arXiv preprint arXiv …, 2018 - arxiv.org

Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …

被引用次数：384 相关文章所有 3 个版本

[PDF] thecvf.com

Say as you wish: Fine-grained control of image caption generation with abstract scene graphs

S Chen, Q Jin, P Wang, Q Wu - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Humans are able to describe image contents with coarse to fine details as they wish.
However, most image captioning models are intention-agnostic which cannot generate …

被引用次数：237 相关文章所有 6 个版本

[PDF] arxiv.org

A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

被引用次数：84 相关文章所有 9 个版本

[PDF] thecvf.com

Positive-augmented contrastive learning for image and video captioning evaluation

S Sarto, M Barraco, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com

The CLIP model has been recently proven to be very effective for a variety of cross-modal
tasks, including the evaluation of captions generated from vision-and-language …

被引用次数：28 相关文章所有 9 个版本

高级搜索

QQ 群