Stablevideo: Text-driven consistency-aware diffusion video editing

W Chai, X Guo, G Wang, Y Lu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …

Back to optimization: Diffusion-based zero-shot 3d human pose estimation

Z Jiang, Z Zhou, L Li, W Chai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with
significantly better performance in most benchmarks than traditional optimization-based …

Chasing consistency in text-to-3d generation from a single image

Y Ouyang, W Chai, J Ye, D Tao, Y Zhan… - arXiv preprint arXiv …, 2023 - arxiv.org
Text-to-3D generation from a single-view image is a popular but challenging task in 3D
vision. Although numerous methods have been proposed, existing works still suffer from the …

Versat2i: Improving text-to-image models with versatile reward

J Guo, W Chai, J Deng, HW Huang, T Ye, Y Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data,
demonstrating impressive performance. However, these T2I models still struggle to produce …

Citygen: Infinite and controllable 3d city layout generation

J Deng, W Chai, J Guo, Q Huang, W Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
City layout generation has recently gained significant attention. The goal of this task is to
automatically generate the layout of a city scene, including elements such as roads …

Do we really need a complex agent system? distill embodied agent into a single model

Z Zhao, K Ma, W Chai, X Wang, K Chen, D Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With the power of large language models (LLMs), open-ended embodied agents can flexibly
understand human instructions, generate interpretable guidance strategies, and output …

Learning Diffusion Texture Priors for Image Restoration

T Ye, S Chen, W Chai, Z Xing, J Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion Models have shown remarkable performance in image generation tasks which are
capable of generating diverse and realistic image content. When adopting diffusion models …

DiFashion: Towards Personalized Outfit Generation

Y Xu, W Wang, F Feng, Y Ma, J Zhang, X He - arXiv preprint arXiv …, 2024 - arxiv.org
The evolution of Outfit Recommendation (OR) in the realm of fashion has progressed
through two distinct phases: Pre-defined Outfit Recommendation and Personalized Outfit …

Diffusion Models for Generative Outfit Recommendation

Y Xu, W Wang, F Feng, Y Ma, J Zhang… - Proceedings of the 47th …, 2024 - dl.acm.org
Outfit Recommendation (OR) in the fashion domain has evolved through two stages: Pre-
defined Outfit Recommendation and Personalized Outfit Composition. However, both stages …

User-aware prefix-tuning is a good learner for personalized image captioning

X Wang, G Wang, W Chai, J Zhou, G Wang - Chinese Conference on …, 2023 - Springer
Image captioning bridges the gap between vision and language by automatically generating
natural language descriptions for images. Traditional image captioning methods often …