- 学术资源搜索

PIXART- : Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

J Chen, C Ge, E Xie, Y Wu, L Yao, X Ren… - … on Computer Vision, 2025 - Springer

In this paper, we introduce PixArt-\(\Sigma\), a Diffusion Transformer model (DiT) capable of
directly generating images at 4K resolution. PixArt-\(\Sigma\) represents a significant …

被引用次数：89 相关文章所有 2 个版本

[PDF] arxiv.org

Textdiffuser-2: Unleashing the power of language models for text rendering

J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei - European Conference on …, 2025 - Springer

The diffusion model has been proven a powerful generative model in recent years, yet it
remains a challenge in generating visual text. Although existing work has endeavored to …

被引用次数：42 相关文章所有 2 个版本

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

被引用次数：25 相关文章所有 2 个版本

[PDF] thecvf.com

Ledits++: Limitless image editing using text-to-image models

M Brack, F Friedrich, K Kornmeier… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-to-image diffusion models have recently received increasing interest for their
astonishing ability to produce high-fidelity images from solely text inputs. Subsequent …

被引用次数：49 相关文章所有 3 个版本

[PDF] arxiv.org

Representation alignment for generation: Training diffusion transformers is easier than you think

S Yu, S Kwak, H Jang, J Jeong, J Huang, J Shin… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent studies have shown that the denoising process in (generative) diffusion models can
induce meaningful (discriminative) representations inside the model, though the quality of …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Pyramidal flow matching for efficient video generative modeling

Y Jin, Z Sun, N Li, K Xu, H Jiang, N Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org

Video generation requires modeling a vast spatiotemporal space, which demands
significant computational resources and data usage. To reduce the complexity, the …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Bk-sdm: A lightweight, fast, and cheap version of stable diffusion

BK Kim, HK Song, T Castells, S Choi - European Conference on Computer …, 2025 - Springer

Abstract Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high
computing demands due to billion-scale parameters. To enhance efficiency, recent studies …

被引用次数：17 相关文章所有 2 个版本

[PDF] thecvf.com

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Simple and scalable strategies to continually pre-train large language models

A Ibrahim, B Thérien, K Gupta, ML Richter… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start
the process over again once new data becomes available. A much more efficient solution is …

被引用次数：50 相关文章所有 2 个版本

[PDF] arxiv.org

Turboedit: Instant text-based image editing

Z Wu, N Kolkin, J Brandt, R Zhang… - European Conference on …, 2025 - Springer

We address the challenges of precise image inversion and disentangled image editing in
the context of few-step diffusion models. We introduce an encoder based iterative inversion …

被引用次数：4 相关文章所有 6 个版本

高级搜索

QQ 群

PIXART- : Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Textdiffuser-2: Unleashing the power of language models for text rendering

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

Ledits++: Limitless image editing using text-to-image models

Representation alignment for generation: Training diffusion transformers is easier than you think

Pyramidal flow matching for efficient video generative modeling

Bk-sdm: A lightweight, fast, and cheap version of stable diffusion

Eclipse: A resource-efficient text-to-image prior for image generations

Simple and scalable strategies to continually pre-train large language models

Turboedit: Instant text-based image editing

引用