Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

Fastcomposer: Tuning-free multi-subject image generation with localized attention

G Xiao, T Yin, WT Freeman, F Durand… - International Journal of …, 2024 - Springer
Diffusion models excel at text-to-image generation, especially in subject-driven generation
for personalized images. However, existing methods are inefficient due to the subject …

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions

P Alimisis, I Mademlis, P Radoglou-Grammatikis… - Artificial Intelligence …, 2025 - Springer
Image data augmentation constitutes a critical methodology in modern computer vision
tasks, since it can facilitate towards enhancing the diversity and quality of training datasets; …

Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback

J Singh, L Zheng - Advances in Neural Information …, 2023 - proceedings.neurips.cc
The field of text-conditioned image generation has made unparalleled progress with the
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …

High-fidelity Person-centric Subject-to-Image Synthesis

Y Wang, W Zhang, J Zheng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Current subject-driven image generation methods encounter significant challenges in
person-centric image generation. The reason is that they learn the semantic scene and …

Move Anything with Layered Scene Diffusion

J Ren, M Xu, JC Wu, Z Liu, T Xiang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models generate images with an unprecedented level of quality but how can we
freely rearrange image layouts? Recent works generate controllable scenes via learning …

Iterative motion editing with natural language

P Goel, KC Wang, CK Liu, K Fatahalian - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Text-to-motion diffusion models can generate realistic animations from text prompts, but do
not support fine-grained motion editing controls. In this paper, we present a method for using …

Cocktail: Mixing multi-modality control for text-conditional image generation

M Hu, J Zheng, D Liu, C Zheng, C Wang… - … on Neural Information …, 2023 - openreview.net
Text-conditional diffusion models are able to generate high-fidelity images with diverse
contents. However, linguistic representations frequently exhibit ambiguous descriptions of …