Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J Xing, M Xia, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

Seeing what you said: Talking face generation guided by a lip reading expert

J Wang, X Qian, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …

CelebV-HQ: A large-scale video facial attributes dataset

H Zhu, W Wu, W Zhu, L Jiang, S Tang, L Zhang… - European conference on …, 2022 - Springer
Large-scale datasets have played indispensable roles in the recent success of face
generation/editing and significantly facilitated the advances of emerging research fields …

Progressive disentangled representation learning for fine-grained controllable talking head synthesis

D Wang, Y Deng, Z Yin, HY Shum… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a novel one-shot talking head synthesis method that achieves disentangled and
fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression …

Semantic-aware implicit neural audio-driven video portrait generation

X Liu, Y Xu, Q Wu, H Zhou, W Wu, B Zhou - European conference on …, 2022 - Springer
Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital
entertainment. While most previous studies rely on accurate explicit structural information …

Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

Styletalk: One-shot talking head generation with controllable speaking styles

Y Ma, S Wang, Z Hu, C Fan, T Lv, Y Ding… - Proceedings of the …, 2023 - ojs.aaai.org
Different people speak with diverse personalized speaking styles. Although existing one-
shot talking head methods have made significant progress in lip sync, natural facial …

Emmn: Emotional motion memory network for audio-driven emotional talking face generation

S Tan, B Ji, Y Pan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Synthesizing expression is essential to create realistic talking faces. Previous works
consider expressions and mouth shapes as a whole and predict them solely from audio …