Photomaker: Customizing realistic human photos via stacked id embedding

Z Li, M Cao, X Wang, Z Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in text-to-image generation have made remarkable progress in
synthesizing realistic human photos conditioned on given text prompts. However existing …

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net
Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, L Fei-Fei… - arXiv preprint arXiv …, 2023 - arxiv.org
We present WALT, a transformer-based approach for photorealistic video generation via
diffusion modeling. Our approach has two key design decisions. First, we use a causal …

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

F Yu, J Gu, Z Li, J Hu, X Kong, X Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We introduce SUPIR (Scaling-UP Image Restoration) a groundbreaking image
restoration method that harnesses generative prior and the power of model scaling up …

Cache me if you can: Accelerating diffusion models through block caching

F Wimbauer, B Wu, E Schoenfeld… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have recently revolutionized the field of image synthesis due to their ability
to generate photorealistic images. However one of the major drawbacks of diffusion models …

Pixart-\sigma: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

J Chen, C Ge, E Xie, Y Wu, L Yao, X Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of
directly generating images at 4K resolution. PixArt-\Sigma represents a significant …

Scedit: Efficient and controllable image diffusion generation via skip connection editing

Z Jiang, C Mao, Y Pan, Z Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Image diffusion models have been utilized in various tasks such as text-to-image generation
and controllable image synthesis. Recent research has introduced tuning methods that …

GenTron: Diffusion Transformers for Image and Video Generation

S Chen, M Xu, J Ren, Y Cong, S He… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this study we explore Transformer based diffusion models for image and video
generation. Despite the dominance of Transformer architectures in various fields due to their …

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …