Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning

J Ma, J Liang, C Chen, H Lu - ACM SIGGRAPH 2024 Conference …, 2024 - dl.acm.org
Recent progress in personalized image generation using diffusion models has been
significant. However, development in the area of open-domain and test-time fine-tuning-free …

Mastering text-to-image diffusion: Recaptioning, planning, and generating with multimodal llms

L Yang, Z Yu, C Meng, M Xu, S Ermon… - Forty-first International …, 2024 - openreview.net
Diffusion models have exhibit exceptional performance in text-to-image generation and
editing. However, existing methods often face challenges when handling complex text …

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

Unsupervised compositional concepts discovery with text-to-image generative models

N Liu, Y Du, S Li, JB Tenenbaum… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-to-image generative models have enabled high-resolution image synthesis across
different domains, but require users to specify the content they wish to generate. In this …

Proxedit: Improving tuning-free real image editing with proximal guidance

L Han, S Wen, Q Chen, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
DDIM inversion has revealed the remarkable potential of real image editing within diffusion-
based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier …

Diffblender: Scalable and composable multimodal text-to-image diffusion models

S Kim, J Lee, K Hong, D Kim, N Ahn - arXiv preprint arXiv:2305.15194, 2023 - arxiv.org
The recent progress in diffusion-based text-to-image generation models has significantly
expanded generative capabilities via conditioning the text descriptions. However, since …

Attention Calibration for Disentangled Text-to-Image Personalization

Y Zhang, M Yang, Q Zhou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Recent thrilling progress in large-scale text-to-image (T2I) models has unlocked
unprecedented synthesis quality of AI-generated content (AIGC) including image generation …

Divide and conquer: Language models can plan and self-correct for compositional text-to-image generation

Z Wang, E Xie, A Li, Z Wang, X Liu, Z Li - arXiv preprint arXiv:2401.15688, 2024 - arxiv.org
Despite significant advancements in text-to-image models for generating high-quality
images, these methods still struggle to ensure the controllability of text prompts over images …

Maskdiffusion: Boosting text-to-image consistency with conditional mask

Y Zhou, D Zhou, Y Wang, J Feng, Q Hou - International Journal of …, 2024 - Springer
Recent advancements in diffusion models have showcased their impressive capacity to
generate visually striking images. However, ensuring a close match between the generated …

A survey of diffusion based image generation models: Issues and their solutions

T Zhang, Z Wang, J Huang, MM Tasnim… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, there has been significant progress in the development of large models. Following
the success of ChatGPT, numerous language models have been introduced, demonstrating …