From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Imagic: Text-based real image editing with diffusion models

B Kawar, S Zada, O Lang, O Tov… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-conditioned image editing has recently attracted considerable interest. However, most
methods are currently limited to one of the following: specific editing types (eg, object …

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arXiv preprint arXiv …, 2022 - arxiv.org
We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

Multi-concept customization of text-to-image diffusion

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …

Scaling up gans for text-to-image synthesis

M Kang, JY Zhu, R Zhang, J Park… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

Zero-shot image-to-image translation

G Parmar, K Kumar Singh, R Zhang, Y Li, J Lu… - ACM SIGGRAPH 2023 …, 2023 - dl.acm.org
Large-scale text-to-image generative models have shown their remarkable ability to
synthesize diverse, high-quality images. However, directly applying these models for real …

Fatezero: Fusing attentions for zero-shot text-based video editing

C Qi, X Cun, Y Zhang, C Lei, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The diffusion-based generative models have achieved remarkable success in text-based
image generation. However, since it contains enormous randomness in generation …

Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis

A Sauer, T Karras, S Laine… - … on machine learning, 2023 - proceedings.mlr.press
Text-to-image synthesis has recently seen significant progress thanks to large pretrained
language models, large-scale training data, and the introduction of scalable model families …