Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arXiv preprint arXiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

X Shuai, H Ding, X Ma, R Tu, YG Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Image editing aims to edit the given synthetic or real image to meet the specific requirements
from users. It is widely studied in recent years as a promising and challenging field of …

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Y Chang, Y Zhang, Z Fang, Y Wu, Y Bisk… - arXiv preprint arXiv …, 2024 - arxiv.org
The literature on text-to-image generation is plagued by issues of faithfully composing
entities with relations. But there lacks a formal understanding of how entity-relation …

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

X Liu, Y Wei, M Liu, X Lin, P Ren, X Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Human visual imagination usually begins with analogies or rough sketches. For example,
given an image with a girl playing guitar before a building, one may analogously imagine …

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

Y Sun, Y Liu, Y Tang, W Pei, K Chen - arXiv preprint arXiv:2406.18958, 2024 - arxiv.org
The field of text-to-image (T2I) generation has made significant progress in recent years,
largely driven by advancements in diffusion models. Linguistic control enables effective …

Video Diffusion Models are Training-free Motion Interpreter and Controller

Z Xiao, Y Zhou, S Yang, X Pan - arXiv preprint arXiv:2405.14864, 2024 - arxiv.org
Video generation primarily aims to model authentic and customized motion across frames,
making understanding and controlling the motion a crucial topic. Most diffusion-based …

A Survey on Personalized Content Synthesis with Diffusion Models

X Zhang, XY Wei, W Zhang, J Wu, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in generative models have significantly impacted content creation,
leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user …

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

P Ling, J Bu, P Zhang, X Dong, Y Zang, T Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Motion-based controllable text-to-video generation involves motions to control the video
generation. Previous methods typically require the training of models to encode motion cues …

Understanding Training-free Diffusion Guidance: Mechanisms and Limitations

Y Shen, X Jiang, Y Wang, Y Yang, D Han… - arXiv preprint arXiv …, 2024 - arxiv.org
Adding additional control to pretrained diffusion models has become an increasingly
popular research area, with extensive applications in computer vision, reinforcement …

ZoDi: Zero-Shot Domain Adaptation with Diffusion-Based Image Transfer

H Azuma, Y Matsui, A Maki - arXiv preprint arXiv:2403.13652, 2024 - arxiv.org
Deep learning models achieve high accuracy in segmentation tasks among others, yet
domain shift often degrades the models' performance, which can be critical in real-world …