Getting it right: Improving spatial consistency in text-to-image models

A Chatterjee, GBM Stan, E Aflalo, S Paul… - … on Computer Vision, 2025 - Springer
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …

Mobilediffusion: Subsecond text-to-image generation on mobile devices

Y Zhao, Y Xu, Z Xiao, T Hou - arXiv preprint arXiv:2311.16567, 2023 - arxiv.org
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded
by their substantial model size and slow inference speed. In this paper, we propose\textbf …

Linfusion: 1 gpu, 1 minute, 16k image

S Liu, W Yu, Z Tan, X Wang - arXiv preprint arXiv:2409.02097, 2024 - arxiv.org
Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …

An information-theoretic evaluation of generative models in learning multi-modal distributions

M Jalali, CT Li, F Farnia - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The evaluation of generative models has received significant attention in the machine
learning community. When applied to a multi-modal distribution which is common among …

Diffusion priors for dynamic view synthesis from monocular videos

C Wang, P Zhuang, A Siarohin, J Cao, G Qian… - arXiv preprint arXiv …, 2024 - arxiv.org
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within
videos. Existing methods struggle to distinguishing between motion and structure …

Bigger is not always better: Scaling properties of latent diffusion models

K Mei, Z Tu, M Delbracio, H Talebi… - arXiv preprint arXiv …, 2024 - openreview.net
We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their
sampling efficiency. While improved network architecture and inference algorithms have …

T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching

Z Pan, B Zhuang, DA Huang, W Nie, Z Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality
image generation and typically requires many steps with a large model. In this paper, we …

Mobilediffusion: Instant text-to-image generation on mobile devices

Y Zhao, Y Xu, Z Xiao, H Jia, T Hou - European Conference on Computer …, 2025 - Springer
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded
by their substantial model size and high latency. In this paper, we present MobileDiffusion …

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

H Zhu, D Tang, J Liu, M Lu, J Zheng, J Peng… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have achieved remarkable progress in the field of image generation due to
their outstanding capabilities. However, these models require substantial computing …

Effective diffusion transformer architecture for image super-resolution

K Cheng, L Yu, Z Tu, X He, L Chen, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances indicate that diffusion models hold great promise in image super-
resolution. While the latest methods are primarily based on latent diffusion models with …