Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arXiv preprint arXiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Customizing text-to-image generation with inverted interaction

M Ge, X Jia, T Isobe, X Li, Q Wang, J Mu… - Proceedings of the …, 2024 - dl.acm.org
Subject-driven image generation, aimed at customizing user-specified subjects, has
experienced rapid progress. However, most of them focus on transferring the customized …

Record: Reasoning and correcting diffusion for hoi generation

JY Jiang-Lin, KY Huang, L Lo, YN Huang… - Proceedings of the …, 2024 - dl.acm.org
Diffusion models revolutionize image generation by leveraging natural language to guide
the creation of multimedia content. Despite significant advancements in such generative …

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …

Chains of Diffusion Models

Y Wei, L Huang, ZF Wu, W Wang, Y Liu, M Jia… - European Conference on …, 2024 - Springer
Recent generative models excel in creating high-quality single-human images but fail in
complex multi-human scenarios, failing to capture accurate structural details like quantities …

PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation

X Hu, H Wang, JE Lenssen, B Schiele - arXiv preprint arXiv:2501.05823, 2025 - arxiv.org
We introduce PersonaHOI, a training-and tuning-free framework that fuses a general
StableDiffusion model with a personalized face diffusion (PFD) model to generate identity …

ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping

Y Pang, R Shao, J Zhang, H Tu, Y Liu, B Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we introduce ManiVideo, a novel method for generating consistent and
temporally coherent bimanual hand-object manipulation videos from given motion …

Conditional Image Synthesis with Diffusion Models: A Survey

Z Zhan, D Chen, JP Mei, Z Zhao, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Conditional image synthesis based on user-specified requirements is a key component in
creating complex visual content. In recent years, diffusion-based generative modeling has …

HOIEdit: Human–object interaction editing with text-to-image diffusion model

T Xu, W Wang, A Zhong - The Visual Computer, 2025 - Springer
A scene consists of objects and relationships. Despite the success in large-scale text-to-
image generation and text-guided image editing, most existing studies focus on single …

Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis

R Zhou, Y Zhang, C Yuan, F Permenter… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces a generative model designed for multimodal control over text-to-
image foundation generative AI models such as Stable Diffusion, specifically tailored for …