The diffusion model has been proven a powerful generative model in recent years, yet it remains a challenge in generating visual text. Although existing work has endeavored to …
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images …
M Brack, F Friedrich, K Kornmeier… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent …
Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of …
Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage. To reduce the complexity, the …
Abstract Text-to-image (T2I) generation with Stable Diffusion models (SDMs) involves high computing demands due to billion-scale parameters. To enhance efficiency, recent studies …
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is …
Z Wu, N Kolkin, J Brandt, R Zhang… - European Conference on …, 2025 - Springer
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion …