Pixart-$\alpha $: Fast training of diffusion transformer for photorealistic text-to-image synthesis

Z Li, M Cao, X Wang, Z Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advances in text-to-image generation have made remarkable progress in
synthesizing realistic human photos conditioned on given text prompts. However existing …

被引用次数：29 相关文章所有 3 个版本

[PDF] openreview.net

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net

Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, L Fei-Fei… - arXiv preprint arXiv …, 2023 - arxiv.org

We present WALT, a transformer-based approach for photorealistic video generation via
diffusion modeling. Our approach has two key design decisions. First, we use a causal …

被引用次数：54 相关文章所有 3 个版本

[PDF] arxiv.org

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org

General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

被引用次数：7 相关文章所有 3 个版本

[PDF] thecvf.com

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

F Yu, J Gu, Z Li, J Hu, X Kong, X Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We introduce SUPIR (Scaling-UP Image Restoration) a groundbreaking image
restoration method that harnesses generative prior and the power of model scaling up …

被引用次数：10 相关文章所有 3 个版本

[PDF] thecvf.com

Cache me if you can: Accelerating diffusion models through block caching

F Wimbauer, B Wu, E Schoenfeld… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models have recently revolutionized the field of image synthesis due to their ability
to generate photorealistic images. However one of the major drawbacks of diffusion models …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Pixart-\sigma: Weak-to-strong training of diffusion transformer for 4k text-to-image generation

J Chen, C Ge, E Xie, Y Wu, L Yao, X Ren… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of
directly generating images at 4K resolution. PixArt-\Sigma represents a significant …

被引用次数：21 相关文章所有 2 个版本

[PDF] thecvf.com

Scedit: Efficient and controllable image diffusion generation via skip connection editing

Z Jiang, C Mao, Y Pan, Z Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Image diffusion models have been utilized in various tasks such as text-to-image generation
and controllable image synthesis. Recent research has introduced tuning methods that …

被引用次数：8 相关文章所有 3 个版本

[PDF] thecvf.com

GenTron: Diffusion Transformers for Image and Video Generation

S Chen, M Xu, J Ren, Y Cong, S He… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this study we explore Transformer based diffusion models for image and video
generation. Despite the dominance of Transformer architectures in various fields due to their …

被引用次数：1 相关文章

[PDF] thecvf.com

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群