Sparsectrl: Adding sparse controls to text-to-video diffusion models

Y Guo, C Yang, A Rao, M Agrawala, D Lin… - European Conference on …, 2025 - Springer
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …

ReVersion: Diffusion-based relation inversion from images

Z Huang, T Wu, Y Jiang, KCK Chan, Z Liu - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Diffusion models gain increasing popularity for their generative capabilities. Recently, there
have been surging needs to generate customized images by inverting diffusion models from …

Vcoder: Versatile vision encoders for multimodal large language models

J Jain, J Yang, H Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Humans possess the remarkable skill of Visual Perception the ability to see and understand
the seen helping them make sense of the visual world and in turn reason. Multimodal Large …

Smooth diffusion: Crafting smooth latent spaces in diffusion models

J Guo, X Xu, Y Pu, Z Ni, C Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recently diffusion models have made remarkable progress in text-to-image (T2I) generation
synthesizing images with high fidelity and diverse contents. Despite this advancement latent …

Democaricature: Democratising caricature generation with a rough sketch

DY Chen, AK Bhunia, S Koley, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we democratise caricature generation empowering individuals to effortlessly
craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to …

Diffusion for natural image matting

Y Hu, Y Lin, W Wang, Y Zhao, Y Wei, H Shi - European Conference on …, 2025 - Springer
Existing natural image matting algorithms inevitably have flaws in their predictions on
difficult cases, and their one-step prediction manner cannot further correct these errors. In …

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

W Ouyang, Y Dong, L Yang, J Si, X Pan - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
The remarkable generative capabilities of diffusion models have motivated extensive
research in both image and video editing. Compared to video editing which faces additional …

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

S Yang, Y Zhou, Z Liu, CC Loy - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The remarkable efficacy of text-to-image diffusion models has motivated extensive
exploration of their potential application in video domains. Zero-shot methods seek to extend …

Human image generation: A comprehensive survey

Z Jia, Z Zhang, L Wang, T Tan - ACM Computing Surveys, 2024 - dl.acm.org
Image and video synthesis has become a blooming topic in computer vision and machine
learning communities along with the developments of deep generative models, due to its …