From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A review on generative adversarial networks: Algorithms, theory, and applications

J Gui, Z Sun, Y Wen, D Tao, J Ye - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have recently become a hot research topic;
however, they have been studied since 2014, and a large number of algorithms have been …

Generative adversarial network in medical imaging: A review

X Yi, E Walia, P Babyn - Medical image analysis, 2019 - Elsevier
Generative adversarial networks have gained a lot of attention in the computer vision
community due to their capability of data generation without explicitly modelling the …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

A survey on generative adversarial networks: Variants, applications, and training

A Jabbar, X Li, B Omar - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The Generative Models have gained considerable attention in unsupervised learning via a
new and practical framework called Generative Adversarial Networks (GAN) due to their …

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

Object hallucination in image captioning

A Rohrbach, LA Hendricks, K Burns, T Darrell… - arXiv preprint arXiv …, 2018 - arxiv.org
Despite continuously improving performance, contemporary image captioning models are
prone to" hallucinating" objects that are not actually in a scene. One problem is that standard …

Say as you wish: Fine-grained control of image caption generation with abstract scene graphs

S Chen, Q Jin, P Wang, Q Wu - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Humans are able to describe image contents with coarse to fine details as they wish.
However, most image captioning models are intention-agnostic which cannot generate …

A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning

IAM Huijben, W Kool, MB Paulus… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by
its unnormalized (log-) probabilities. Over the past years, the machine learning community …

Positive-augmented contrastive learning for image and video captioning evaluation

S Sarto, M Barraco, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com
The CLIP model has been recently proven to be very effective for a variety of cross-modal
tasks, including the evaluation of captions generated from vision-and-language …