Collage diffusion

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

被引用次数：212 相关文章所有 3 个版本

[PDF] springer.com

Fastcomposer: Tuning-free multi-subject image generation with localized attention

G Xiao, T Yin, WT Freeman, F Durand… - International Journal of …, 2024 - Springer

Diffusion models excel at text-to-image generation, especially in subject-driven generation
for personalized images. However, existing methods are inefficient due to the subject …

被引用次数：171 相关文章所有 2 个版本

[PDF] thecvf.com

Expressive text-to-image generation with rich text

S Ge, T Park, JY Zhu, JB Huang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited
customization options hinder users from accurately describing desired outputs. For example …

被引用次数：75 相关文章所有 6 个版本

[PDF] arxiv.org

Direct-a-video: Customized video generation with user-directed camera movement and object motion

S Yang, L Hou, H Huang, C Ma, P Wan… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

Recent text-to-video diffusion models have achieved impressive progress. In practice, users
often desire the ability to control object motion and camera movement independently for …

被引用次数：53 相关文章所有 2 个版本

[PDF] springer.com

Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions

P Alimisis, I Mademlis, P Radoglou-Grammatikis… - Artificial Intelligence …, 2025 - Springer

Image data augmentation constitutes a critical methodology in modern computer vision
tasks, since it can facilitate towards enhancing the diversity and quality of training datasets; …

被引用次数：4 相关文章所有 3 个版本

[PDF] neurips.cc

Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback

J Singh, L Zheng - Advances in Neural Information …, 2023 - proceedings.neurips.cc

The field of text-conditioned image generation has made unparalleled progress with the
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …

被引用次数：18 相关文章所有 5 个版本

[PDF] thecvf.com

High-fidelity Person-centric Subject-to-Image Synthesis

Y Wang, W Zhang, J Zheng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Current subject-driven image generation methods encounter significant challenges in
person-centric image generation. The reason is that they learn the semantic scene and …

被引用次数：13 相关文章所有 3 个版本

[PDF] thecvf.com

Move Anything with Layered Scene Diffusion

J Ren, M Xu, JC Wu, Z Liu, T Xiang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models generate images with an unprecedented level of quality but how can we
freely rearrange image layouts? Recent works generate controllable scenes via learning …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Iterative motion editing with natural language

P Goel, KC Wang, CK Liu, K Fatahalian - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

Text-to-motion diffusion models can generate realistic animations from text prompts, but do
not support fine-grained motion editing controls. In this paper, we present a method for using …

被引用次数：13 相关文章所有 2 个版本

[PDF] openreview.net

Cocktail: Mixing multi-modality control for text-conditional image generation

M Hu, J Zheng, D Liu, C Zheng, C Wang… - … on Neural Information …, 2023 - openreview.net

Text-conditional diffusion models are able to generate high-fidelity images with diverse
contents. However, linguistic representations frequently exhibit ambiguous descriptions of …

被引用次数：19 相关文章所有 6 个版本

高级搜索

QQ 群

Anydoor: Zero-shot object-level image customization

Fastcomposer: Tuning-free multi-subject image generation with localized attention

Expressive text-to-image generation with rich text

Direct-a-video: Customized video generation with user-directed camera movement and object motion

Advances in diffusion models for image data augmentation: A review of methods, models, evaluation metrics and future research directions

Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback

High-fidelity Person-centric Subject-to-Image Synthesis

Move Anything with Layered Scene Diffusion

Iterative motion editing with natural language

Cocktail: Mixing multi-modality control for text-conditional image generation

引用