Z Tan,
M Yang,
L Qin, H Yang, Y Qian, Q Zhou… - … on Computer Vision, 2025 - Springer
One critical prerequisite for faithful text-to-image generation is the accurate understanding of
text inputs. Existing methods leverage the text encoder of the CLIP model to represent input …