Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

[HTML][HTML] Adversarial text-to-image synthesis: A review

S Frolov, T Hinz, F Raue, J Hees, A Dengel - Neural Networks, 2021 - Elsevier
With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Semantic object accuracy for generative text-to-image synthesis

T Hinz, S Heinrich, S Wermter - IEEE transactions on pattern …, 2020 - ieeexplore.ieee.org
Generative adversarial networks conditioned on textual image descriptions are capable of
generating realistic-looking images. However, current methods still struggle to generate …

Improving text-to-image synthesis using contrastive learning

H Ye, X Yang, M Takac, R Sunderraman… - arXiv preprint arXiv …, 2021 - arxiv.org
The goal of text-to-image synthesis is to generate a visually realistic image that matches a
given text description. In practice, the captions annotated by humans for the same image …

Neural architecture search with a lightweight transformer for text-to-image synthesis

W Li, S Wen, K Shi, Y Yang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of
the latest works in this field are based on the network architectures proposed by …

A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis

J Agnese, J Herrera, H Tao… - … Reviews: Data Mining and …, 2020 - Wiley Online Library
Text‐to‐image synthesis refers to computational methods which translate human written
textual descriptions, in the form of keywords or sentences, into images with similar semantic …

A comprehensive survey on generative adversarial networks used for synthesizing multimedia content

L Kumar, DK Singh - Multimedia Tools and Applications, 2023 - Springer
GAN's are playing an important role in creating and generating a new set of data from the
previously available content. GAN models are impressive in the results for image and video …

Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis

Y Yang, L Wang, D Xie, C Deng… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Due to the development of Generative Adversarial Networks (GANs), significant progress
has been achieved in text-to-image synthesis task. However, most previous works have only …