Prompt-to-prompt image editing with cross attention control

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

被引用次数：150 相关文章所有 4 个版本

[PDF] arxiv.org

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：179 相关文章所有 11 个版本

[PDF] thecvf.com

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

被引用次数：1896 相关文章所有 6 个版本

[PDF] thecvf.com

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

N Ruiz, Y Li, V Jampani, Y Pritch… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-
quality and diverse synthesis of images from a given text prompt. However, these models …

被引用次数：1533 相关文章所有 6 个版本

[PDF] thecvf.com

Instructpix2pix: Learning to follow image editing instructions

T Brooks, A Holynski, AA Efros - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We propose a method for editing images from human instructions: given an input image and
a written instruction that tells the model what to do, our model follows these instructions to …

被引用次数：913 相关文章所有 7 个版本

[PDF] thecvf.com

Align your latents: High-resolution video synthesis with latent diffusion models

A Blattmann, R Rombach, H Ling… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding
excessive compute demands by training a diffusion model in a compressed lower …

被引用次数：482 相关文章所有 6 个版本

[PDF] thecvf.com

Multi-concept customization of text-to-image diffusion

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …

被引用次数：432 相关文章所有 5 个版本

[PDF] aaai.org

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

被引用次数：461 相关文章所有 3 个版本

[PDF] thecvf.com

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

JZ Wu, Y Ge, X Wang, SW Lei, Y Gu… - Proceedings of the …, 2023 - openaccess.thecvf.com

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale
video datasets to train a text-to-video (T2V) generator. Despite their promising results, such …

被引用次数：420 相关文章所有 4 个版本

[PDF] thecvf.com

Null-text inversion for editing real images using guided diffusion models

R Mokady, A Hertz, K Aberman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent large-scale text-guided diffusion models provide powerful image generation
capabilities. Currently, a massive effort is given to enable the modification of these images …

被引用次数：470 相关文章所有 5 个版本

高级搜索

QQ 群