Compositional text-to-image synthesis with attention map control of diffusion models

J Ma, J Liang, C Chen, H Lu - ACM SIGGRAPH 2024 Conference …, 2024 - dl.acm.org

Recent progress in personalized image generation using diffusion models has been
significant. However, development in the area of open-domain and test-time fine-tuning-free …

被引用次数：101 相关文章所有 3 个版本

[PDF] openreview.net

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net

Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

被引用次数：84 相关文章所有 3 个版本

[PDF] arxiv.org

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Unsupervised compositional concepts discovery with text-to-image generative models

N Liu, Y Du, S Li, JB Tenenbaum… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-to-image generative models have enabled high-resolution image synthesis across
different domains, but require users to specify the content they wish to generate. In this …

被引用次数：27 相关文章所有 5 个版本

[PDF] thecvf.com

Proxedit: Improving tuning-free real image editing with proximal guidance

L Han, S Wen, Q Chen, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

DDIM inversion has revealed the remarkable potential of real image editing within diffusion-
based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Diffblender: Scalable and composable multimodal text-to-image diffusion models

S Kim, J Lee, K Hong, D Kim, N Ahn - arXiv preprint arXiv:2305.15194, 2023 - arxiv.org

The recent progress in diffusion-based text-to-image generation models has significantly
expanded generative capabilities via conditioning the text descriptions. However, since …

被引用次数：16 相关文章所有 2 个版本

[PDF] thecvf.com

Attention Calibration for Disentangled Text-to-Image Personalization

Y Zhang, M Yang, Q Zhou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent thrilling progress in large-scale text-to-image (T2I) models has unlocked
unprecedented synthesis quality of AI-generated content (AIGC) including image generation …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Divide and conquer: Language models can plan and self-correct for compositional text-to-image generation

Z Wang, E Xie, A Li, Z Wang, X Liu, Z Li - arXiv preprint arXiv:2401.15688, 2024 - arxiv.org

Despite significant advancements in text-to-image models for generating high-quality
images, these methods still struggle to ensure the controllability of text prompts over images …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Maskdiffusion: Boosting text-to-image consistency with conditional mask

Y Zhou, D Zhou, Y Wang, J Feng, Q Hou - International Journal of …, 2024 - Springer

Recent advancements in diffusion models have showcased their impressive capacity to
generate visually striking images. However, ensuring a close match between the generated …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of diffusion based image generation models: Issues and their solutions

T Zhang, Z Wang, J Huang, MM Tasnim… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, there has been significant progress in the development of large models. Following
the success of ChatGPT, numerous language models have been introduced, demonstrating …

被引用次数：22 相关文章所有 2 个版本

高级搜索

QQ 群