Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has …
B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu, T Ling… - arXiv preprint arXiv …, 2024 - arxiv.org
While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order …
J Kim, Z Wang, Q Qiu - arXiv preprint arXiv:2404.00879, 2024 - arxiv.org
Efficient text-to-image generation remains a challenging task due to the high computational costs associated with the multi-step sampling in diffusion models. Although distillation of pre …