D Yang, R Dong,
J Ji, Y Ma,
H Wang, X Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, diffusion models have increasingly demonstrated their capabilities in vision
understanding. By leveraging prompt-based learning to construct sentences, these models …