Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based …
Y Ouyang, W Chai, J Ye, D Tao, Y Zhan… - arXiv preprint arXiv …, 2023 - arxiv.org
Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the …
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce …
City layout generation has recently gained significant attention. The goal of this task is to automatically generate the layout of a city scene, including elements such as roads …
With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output …
T Ye, S Chen, W Chai, Z Xing, J Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion Models have shown remarkable performance in image generation tasks which are capable of generating diverse and realistic image content. When adopting diffusion models …
The evolution of Outfit Recommendation (OR) in the realm of fashion has progressed through two distinct phases: Pre-defined Outfit Recommendation and Personalized Outfit …
Y Xu, W Wang, F Feng, Y Ma, J Zhang… - Proceedings of the 47th …, 2024 - dl.acm.org
Outfit Recommendation (OR) in the fashion domain has evolved through two stages: Pre- defined Outfit Recommendation and Personalized Outfit Composition. However, both stages …
Image captioning bridges the gap between vision and language by automatically generating natural language descriptions for images. Traditional image captioning methods often …