Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis

J Li, J Zhang, X Bai, J Zhou… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based
architecture for talking portrait synthesis that can concurrently achieve fast convergence, real …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Diffsheg: A diffusion-based approach for real-time speech-driven holistic 3d expression and gesture generation

J Chen, Y Liu, J Wang, A Zeng, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We propose DiffSHEG a Diffusion-based approach for Speech-driven Holistic 3D
Expression and Gesture generation. While previous works focused on co-speech gesture or …

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

C Xu, Y Liu, J Xing, W Wang, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we abstract the process of people hearing speech extracting meaningful cues
and creating various dynamically audio-consistent talking faces termed Listening and …

Revisiting generalizability in deepfake detection: Improving metrics and stabilizing transfer

S Kamat, S Agarwal, T Darrell… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract" Generalizability" is seen as the hallmark quality of a good deepfake detection
model. However, standard out-of-domain evaluation datasets are very similar in form to the …

Moda: Mapping-once audio-driven portrait animation with dual attentions

Y Liu, L Lin, F Yu, C Zhou, Y Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Audio-driven portrait animation aims to synthesize portrait videos that are conditioned by
given audio. Animating high-fidelity and multimodal video portraits has a variety of …

Face generation and editing with stylegan: A survey

A Melnik, M Miasayedzenkau… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Our goal with this survey is to provide an overview of the state of the art deep learning
methods for face generation and editing using StyleGAN. The survey covers the evolution of …

Multimodal-driven talking face generation via a unified diffusion-based generator

C Xu, S Zhu, J Zhu, T Huang, J Zhang, Y Tai… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal-driven talking face generation refers to animating a portrait with the given pose,
expression, and gaze transferred from the driving image and video, or estimated from the …

Make your actor talk: Generalizable and high-fidelity lip sync with motion and appearance disentanglement

R Yu, T He, A Zhang, Y Wang, J Guo, X Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
We aim to edit the lip movements in talking video according to the given speech while
preserving the personal identity and visual details. The task can be decomposed into two …

Dagan++: Depth-aware generative adversarial network for talking head video generation

FT Hong, L Shen, D Xu - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Predominant techniques on talking head generation largely depend on 2D information,
including facial appearances and motions from input face images. Nevertheless, dense 3D …