Scaling rectified flow transformers for high-resolution image synthesis

P Esser, S Kulal, A Blattmann, R Entezari… - … on Machine Learning, 2024 - openreview.net
Diffusion models create data from noise by inverting the forward paths of data towards noise
and have emerged as a powerful generative modeling technique for high-dimensional …

Zigma: A dit-style zigzag mamba diffusion model

VT Hu, SA Baumann, M Gui, O Grebenkova… - … on Computer Vision, 2024 - Springer
The diffusion model has long been plagued by scalability and quadratic complexity issues,
especially within transformer-based structures. In this study, we aim to leverage the long …

Diffusion models and representation learning: A survey

M Fuest, P Ma, M Gui, JS Fischer, VT Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion Models are popular generative modeling methods in various vision tasks, attracting
significant attention. They can be considered a unique instance of self-supervised learning …

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Q Liu, X Yin, A Yuille, A Brown, M Singh - arXiv preprint arXiv:2412.15213, 2024 - arxiv.org
Diffusion models, and their generalization, flow matching, have had a remarkable impact on
the field of media generation. Here, the conventional approach is to learn the complex …

DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow

K Deng, Y Guo, J Sun, Z Zou, Y Li, X Cai, Y Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern 3D generation methods can rapidly create shapes from sparse or single views, but
their outputs often lack geometric detail due to computational constraints. We present …

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

T Ki, D Min, G Chae - arXiv preprint arXiv:2412.01064, 2024 - arxiv.org
With the rapid advancement of diffusion-based generative models, portrait image animation
has achieved remarkable results. However, it still faces challenges in temporally consistent …