On architectural compression of text-to-image diffusion models

X Li, Y Ren, X Jin, C Lan, X Wang, W Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org

Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …

被引用次数：79 相关文章所有 2 个版本

[PDF] arxiv.org

Sdxl: Improving latent diffusion models for high-resolution image synthesis

D Podell, Z English, K Lacey, A Blattmann… - arXiv preprint arXiv …, 2023 - arxiv.org

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …

被引用次数：1489 相关文章所有 4 个版本

[PDF] neurips.cc

Snapfusion: Text-to-image diffusion model on mobile devices within two seconds

Y Li, H Wang, Q Jin, J Hu… - Advances in …, 2024 - proceedings.neurips.cc

Text-to-image diffusion models can create stunning images from natural language
descriptions that rival the work of professional artists and photographers. However, these …

被引用次数：135 相关文章所有 5 个版本

[PDF] thecvf.com

Deepcache: Accelerating diffusion models for free

X Ma, G Fang, X Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Diffusion models have recently gained unprecedented attention in the field of image
synthesis due to their remarkable generative capabilities. Notwithstanding their prowess …

被引用次数：93 相关文章所有 3 个版本

[PDF] arxiv.org

On the design fundamentals of diffusion models: A survey

Z Chang, GA Koulieris, HPH Shum - arXiv preprint arXiv:2306.04542, 2023 - arxiv.org

Diffusion models are generative models, which gradually add and remove noise to learn the
underlying distribution of training data for data generation. The components of diffusion …

被引用次数：58 相关文章所有 2 个版本

[PDF] arxiv.org

Object-centric diffusion for efficient video editing

K Kahatapitiya, A Karjauv, D Abati, F Porikli… - … on Computer Vision, 2025 - Springer

Diffusion-based video editing have reached impressive quality and can transform either the
global style, local structure, and attributes of given video inputs, following textual edit …

被引用次数：9 相关文章所有 4 个版本

[PDF] thecvf.com

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Mobilediffusion: Subsecond text-to-image generation on mobile devices

Y Zhao, Y Xu, Z Xiao, T Hou - arXiv preprint arXiv:2311.16567, 2023 - arxiv.org

The deployment of large-scale text-to-image diffusion models on mobile devices is impeded
by their substantial model size and slow inference speed. In this paper, we propose\textbf …

被引用次数：42 相关文章所有 2 个版本

Faster diffusion: Rethinking the role of unet encoder in diffusion models

S Li, T Hu, F Shahbaz Khan, L Li, S Yang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

One of the key components within diffusion models is the UNet for noise prediction. While
several works have explored basic properties of the UNet decoder, its encoder largely …

被引用次数：21 相关文章

[PDF] openreview.net

Bigger is not always better: Scaling properties of latent diffusion models

K Mei, Z Tu, M Delbracio, H Talebi… - arXiv preprint arXiv …, 2024 - openreview.net

We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their
sampling efficiency. While improved network architecture and inference algorithms have …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群