Slimflow: Training smaller one-step diffusion models with rectified flow

Y Zhu, X Liu, Q Liu - European Conference on Computer Vision, 2025 - Springer
Diffusion models excel in high-quality generation but suffer from slow inference due to
iterative sampling. While recent methods have successfully transformed diffusion models …

Text‐to‐3D Shape Generation

H Lee, M Savva, AX Chang - Computer Graphics Forum, 2024 - Wiley Online Library
Recent years have seen an explosion of work and interest in text‐to‐3D shape generation.
Much of the progress is driven by advances in 3D representations, large‐scale pretraining …

Decouple-Then-Merge: Towards Better Training for Diffusion Models

Q Ma, X Ning, D Liu, L Niu, L Zhang - arXiv preprint arXiv:2410.06664, 2024 - arxiv.org
Diffusion models are trained by learning a sequence of models that reverse each step of
noise corruption. Typically, the model parameters are fully shared across multiple timesteps …

EdgeFusion: On-Device Text-to-Image Generation

T Castells, HK Song, T Piao, S Choi, BK Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
The intensive computational burden of Stable Diffusion (SD) for text-to-image generation
poses a significant hurdle for its practical application. To tackle this challenge, recent …

Hash3D: Training-free Acceleration for 3D Generation

X Yang, X Wang - arXiv preprint arXiv:2404.06091, 2024 - arxiv.org
The evolution of 3D generative modeling has been notably propelled by the adoption of 2D
diffusion models. Despite this progress, the cumbersome optimization process per se …

Layer-and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

H You, C Barnes, Y Zhou, Y Kang, Z Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation
quality but suffer from high latency and memory inefficiency, making them difficult to deploy …

Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models

Y Wu, H Wang, Z Chen, D Xu - arXiv preprint arXiv:2411.18375, 2024 - arxiv.org
The high computational cost and slow inference time are major obstacles to deploying the
video diffusion model (VDM) in practical applications. To overcome this, we introduce a new …

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

H Wang, CY Ma, YC Liu, J Hou, T Xu, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video generation enhances content creation but is highly computationally intensive:
The computational cost of Diffusion Transformers (DiTs) scales quadratically in the number …

Conditional Image Synthesis with Diffusion Models: A Survey

Z Zhan, D Chen, JP Mei, Z Zhao, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Conditional image synthesis based on user-specified requirements is a key component in
creating complex visual content. In recent years, diffusion-based generative modeling has …

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models

W Cho, S Choi, D Das, M Reisser, T Kim, S Yun… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in text-to-image diffusion models have enabled the personalization of
these models to generate custom images from textual prompts. This paper presents an …