Y Han, R Wang, C Zhang, J Hu, P Cheng, B Fu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in image generation have enabled the creation of high-quality
images from text conditions. However, when facing multi-modal conditions, such as text …