Slight Corruption in Pre-training Data Makes Better Diffusion Models

H Chen, Y Han, D Misra, X Li, K Hu, D Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-
quality images, audios, and videos. They benefit significantly from extensive pre-training on …

Is Your Text-to-Image Model Robust to Caption Noise?

W Yu, Z Yang, S Lin, Q Zhao, J Wang, L Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
In text-to-image (T2I) generation, a prevalent training technique involves utilizing Vision
Language Models (VLMs) for image re-captioning. Even though VLMs are known to exhibit …