State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Training-free consistent text-to-image generation

Y Tewel, O Kaduri, R Gal, Y Kasten, L Wolf… - ACM Transactions on …, 2024 - dl.acm.org
Text-to-image models offer a new level of creative flexibility by allowing users to guide the
image generation process through natural language. However, using these models to …

Omg: Occlusion-friendly personalized multi-concept generation in diffusion models

Z Kong, Y Zhang, T Yang, T Wang, K Zhang… - … on Computer Vision, 2025 - Springer
Personalization is an important topic in text-to-image generation, especially the challenging
multi-concept personalization. Current multi-concept methods are struggling with identity …

Customizing text-to-image models with a single image pair

M Jones, SY Wang, N Kumari, D Bau… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Art reinterpretation is the practice of creating a variation of a reference work, making a paired
artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to …

Moa: Mixture-of-attention for subject-context disentanglement in personalized image generation

KC Wang, D Ostashev, Y Fang, S Tulyakov… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce a new architecture for personalization of text-to-image diffusion models,
coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in …

Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arXiv preprint arXiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

MC: Multi-concept Guidance for Customized Multi-concept Generation

J Jiang, Y Zhang, K Feng, X Wu, W Zuo - arXiv preprint arXiv:2404.05268, 2024 - arxiv.org
Customized text-to-image generation aims to synthesize instantiations of user-specified
concepts and has achieved unprecedented progress in handling individual concept …

Recovering the Pre-Fine-Tuning Weights of Generative Models

E Horwitz, J Kahana, Y Hoshen - arXiv preprint arXiv:2402.10208, 2024 - arxiv.org
The dominant paradigm in generative modeling consists of two steps: i) pre-training on a
large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine …

Break-for-make: Modular low-rank adaptations for composable content-style customization

Y Xu, F Tang, J Cao, Y Zhang, O Deussen… - arXiv preprint arXiv …, 2024 - arxiv.org
Personalized generation paradigms empower designers to customize visual intellectual
properties with the help of textual descriptions by tuning or adapting pre-trained text-to …