I4VGen: Image as Stepping Stone for Text-to-Video Generation

X Guo, J Liu, M Cui, D Huang - arXiv preprint arXiv:2406.02230, 2024 - arxiv.org
Text-to-video generation has lagged behind text-to-image synthesis in quality and diversity
due to the complexity of spatio-temporal modeling and limited video-text datasets. This …

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Y Ban, R Wang, T Zhou, B Gong, CJ Hsieh… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have achieved remarkable success in text-to-image generation tasks;
however, the role of initial noise has been rarely explored. In this study, we identify specific …

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

K Xu, L Zhang, J Shi - arXiv preprint arXiv:2405.14828, 2024 - arxiv.org
Recent advances in text-to-image (T2I) diffusion models have facilitated creative and
photorealistic image synthesis. By varying the random seeds, we can generate various …