Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey

X Li, Y Ren, X Jin, C Lan, X Wang, W Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org
Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

Imagereward: Learning and evaluating human preferences for text-to-image generation

J Xu, X Liu, Y Wu, Y Tong, Q Li… - Advances in …, 2024 - proceedings.neurips.cc
We present a comprehensive solution to learn and improve text-to-image models from
human preference feedback. To begin with, we build ImageReward---the first general …

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arXiv preprint arXiv:2303.07909, 2023 - arxiv.org
This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

A survey on generative diffusion models

H Cao, C Tan, Z Gao, Y Xu, G Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep generative models have unlocked another profound realm of human creativity. By
capturing and generalizing patterns within data, we have entered the epoch of all …

On evaluating adversarial robustness of large vision-language models

Y Zhao, T Pang, C Du, X Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …

One transformer fits all distributions in multi-modal diffusion at scale

F Bao, S Nie, K Xue, C Li, S Pu… - International …, 2023 - proceedings.mlr.press
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions
relevant to a set of multi-modal data in one model. Our key insight is–learning diffusion …

Any-to-any generation via composable diffusion

Z Tang, Z Yang, C Zhu, M Zeng… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract We present Composable Diffusion (CoDi), a novel generative model capable of
generating any combination of output modalities, such as language, image, video, or audio …

Forget-me-not: Learning to forget in text-to-image diffusion models

G Zhang, K Wang, X Xu, Z Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The significant advances in applications of text-to-image generation models have prompted
the demand of a post-hoc adaptation algorithms that can efficiently remove unwanted …