Versatile diffusion: Text, images and variations all in one diffusion model

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org

Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

被引用次数：1125 相关文章所有 6 个版本

[PDF] arxiv.org

Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey

X Li, Y Ren, X Jin, C Lan, X Wang, W Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org

Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …

被引用次数：51 相关文章所有 2 个版本

[PDF] thecvf.com

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

被引用次数：351 相关文章所有 7 个版本

[PDF] neurips.cc

Imagereward: Learning and evaluating human preferences for text-to-image generation

J Xu, X Liu, Y Wu, Y Tong, Q Li… - Advances in …, 2024 - proceedings.neurips.cc

We present a comprehensive solution to learn and improve text-to-image models from
human preference feedback. To begin with, we build ImageReward---the first general …

被引用次数：226 相关文章所有 6 个版本

[PDF] arxiv.org

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arXiv preprint arXiv:2303.07909, 2023 - arxiv.org

This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

被引用次数：256 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on generative diffusion models

H Cao, C Tan, Z Gao, Y Xu, G Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep generative models have unlocked another profound realm of human creativity. By
capturing and generalizing patterns within data, we have entered the epoch of all …

被引用次数：287 相关文章所有 5 个版本

[PDF] neurips.cc

On evaluating adversarial robustness of large vision-language models

Y Zhao, T Pang, C Du, X Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc

Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …

被引用次数：106 相关文章所有 8 个版本

[PDF] mlr.press

One transformer fits all distributions in multi-modal diffusion at scale

F Bao, S Nie, K Xue, C Li, S Pu… - International …, 2023 - proceedings.mlr.press

This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions
relevant to a set of multi-modal data in one model. Our key insight is–learning diffusion …

被引用次数：107 相关文章所有 7 个版本

[PDF] neurips.cc

Any-to-any generation via composable diffusion

Z Tang, Z Yang, C Zhu, M Zeng… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract We present Composable Diffusion (CoDi), a novel generative model capable of
generating any combination of output modalities, such as language, image, video, or audio …

被引用次数：106 相关文章所有 8 个版本

[PDF] thecvf.com

Forget-me-not: Learning to forget in text-to-image diffusion models

G Zhang, K Wang, X Xu, Z Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The significant advances in applications of text-to-image generation models have prompted
the demand of a post-hoc adaptation algorithms that can efficiently remove unwanted …

被引用次数：101 相关文章所有 5 个版本

高级搜索

QQ 群