Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Human-computer interaction system: A survey of talking-head generation

R Zhen, W Song, Q He, J Cao, L Shi, J Luo - Electronics, 2023 - mdpi.com
Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …

Humangaussian: Text-driven 3d human generation with gaussian splatting

X Liu, X Zhan, J Tang, Y Shan, G Zeng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Realistic 3D human generation from text prompts is a desirable yet challenging task.
Existing methods optimize 3D representations like mesh or neural fields via score distillation …

Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians

Y Xu, B Chen, Z Li, H Zhang, L Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Creating high-fidelity 3D head avatars has always been a research hotspot but there
remains a great challenge under lightweight sparse view setups. In this paper we propose …

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J Xing, M Xia, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …

Reconstructing personalized semantic facial nerf models from monocular video

X Gao, C Zhong, J Xiang, Y Hong, Y Guo… - ACM Transactions on …, 2022 - dl.acm.org
We present a novel semantic model for human head defined with neural radiance field. The
3D-consistent head model consist of a set of disentangled and interpretable bases, and can …

Difftalk: Crafting diffusion models for generalized audio-driven portraits animation

S Shen, W Zhao, Z Meng, W Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Talking head synthesis is a promising approach for the video production industry. Recently,
a lot of effort has been devoted in this research area to improve the generation quality or …

Learning hierarchical cross-modal association for co-speech gesture generation

X Liu, Q Wu, H Zhou, Y Xu, R Qian… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …

Identity-preserving talking face generation with landmark and appearance priors

W Zhong, C Fang, Y Cai, P Wei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating talking face videos from audio attracts lots of research interest. A few person-
specific methods can generate vivid videos but require the target speaker's videos for …