W Mai, J Zhang, P Fang, Z Zhang - arXiv preprint arXiv:2401.00430, 2023 - arxiv.org
In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal
synthesis technologies (eg, text-to-image, text-to-video, text-to-audio, etc) are gradually …