Prompt-free diffusion: Taking" text" out of text-to-image diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2025 - Springer

The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

被引用次数：68 相关文章所有 2 个版本

[PDF] thecvf.com

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …

被引用次数：40 相关文章所有 4 个版本

[PDF] arxiv.org

ReVersion: Diffusion-based relation inversion from images

Z Huang, T Wu, Y Jiang, KCK Chan, Z Liu - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

Diffusion models gain increasing popularity for their generative capabilities. Recently, there
have been surging needs to generate customized images by inverting diffusion models from …

被引用次数：58 相关文章所有 2 个版本

[PDF] thecvf.com

Vcoder: Versatile vision encoders for multimodal large language models

J Jain, J Yang, H Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Humans possess the remarkable skill of Visual Perception the ability to see and understand
the seen helping them make sense of the visual world and in turn reason. Multimodal Large …

被引用次数：34 相关文章所有 3 个版本

[PDF] thecvf.com

Smooth diffusion: Crafting smooth latent spaces in diffusion models

J Guo, X Xu, Y Pu, Z Ni, C Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently diffusion models have made remarkable progress in text-to-image (T2I) generation
synthesizing images with high fidelity and diverse contents. Despite this advancement latent …

被引用次数：14 相关文章所有 3 个版本

[PDF] thecvf.com

Democaricature: Democratising caricature generation with a rough sketch

DY Chen, AK Bhunia, S Koley, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we democratise caricature generation empowering individuals to effortlessly
craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to …

被引用次数：6 相关文章所有 3 个版本

[PDF] pkwyx.com

Diffusion for natural image matting

Y Hu, Y Lin, W Wang, Y Zhao, Y Wei, H Shi - European Conference on …, 2025 - Springer

Existing natural image matting algorithms inevitably have flaws in their predictions on
difficult cases, and their one-step prediction manner cannot further correct these errors. In …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

W Ouyang, Y Dong, L Yang, J Si, X Pan - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org

The remarkable generative capabilities of diffusion models have motivated extensive
research in both image and video editing. Compared to video editing which faces additional …

被引用次数：6 相关文章所有 2 个版本

[PDF] thecvf.com

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

S Yang, Y Zhou, Z Liu, CC Loy - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The remarkable efficacy of text-to-image diffusion models has motivated extensive
exploration of their potential application in video domains. Zero-shot methods seek to extend …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Human image generation: A comprehensive survey

Z Jia, Z Zhang, L Wang, T Tan - ACM Computing Surveys, 2024 - dl.acm.org

Image and video synthesis has become a blooming topic in computer vision and machine
learning communities along with the developments of deep generative models, due to its …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群