Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

Swiftbrush: One-step text-to-image diffusion model with variational score distillation

TH Nguyen, A Tran - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Despite their ability to generate high-resolution and diverse images from text prompts text-to-
image diffusion models often suffer from slow iterative sampling processes. Model distillation …

Are diffusion models vision-and-language reasoners?

B Krojer, E Poole-Dayan, V Voleti, C Pal… - … seventh Conference on …, 2023 - openreview.net
Text-conditioned image generation models have recently shown immense qualitative
success using denoising diffusion processes. However, unlike discriminative vision-and …

Initno: Boosting text-to-image diffusion models via initial noise optimization

X Guo, J Liu, M Cui, J Li, H Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent strides in the development of diffusion models exemplified by advancements such as
Stable Diffusion have underscored their remarkable prowess in generating visually …

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Z Wang, L Wei, T Wang, H Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Text-to-image (T2I) generative models have recently emerged as a powerful tool
enabling the creation of photo-realistic images and giving rise to a multitude of applications …

COMCAT: towards efficient compression and customization of attention-based vision models

J Xiao, M Yin, Y Gong, X Zang, J Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
Attention-based vision models, such as Vision Transformer (ViT) and its variants, have
shown promising performance in various computer vision tasks. However, these emerging …

Unleashing text-to-image diffusion models for visual perception

W Zhao, Y Rao, Z Liu, B Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion models (DMs) have become the new trend of generative models and have
demonstrated a powerful ability of conditional synthesis. Among those, text-to-image …

-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

M Patel, S Jung, C Baral, Y Yang - arXiv preprint arXiv:2402.05195, 2024 - arxiv.org
Despite the recent advances in personalized text-to-image (P-T2I) generative models,
subject-driven T2I remains challenging. The primary bottlenecks include 1) Intensive training …

Cross-attention makes inference cumbersome in text-to-image diffusion models

W Zhang, H Liu, J Xie, F Faccio, MZ Shou… - arXiv preprint arXiv …, 2024 - arxiv.org
This study explores the role of cross-attention during inference in text-conditional diffusion
models. We find that cross-attention outputs converge to a fixed point after few inference …