Difftalk: Crafting diffusion models for generalized talking head synthesis

J Li, J Zhang, X Bai, J Zhou… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based
architecture for talking portrait synthesis that can concurrently achieve fast convergence, real …

被引用次数：51 相关文章所有 6 个版本

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

被引用次数：47 相关文章

[PDF] thecvf.com

Diffsheg: A diffusion-based approach for real-time speech-driven holistic 3d expression and gesture generation

J Chen, Y Liu, J Wang, A Zeng, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We propose DiffSHEG a Diffusion-based approach for Speech-driven Holistic 3D
Expression and Gesture generation. While previous works focused on co-speech gesture or …

被引用次数：18 相关文章所有 5 个版本

[PDF] thecvf.com

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

C Xu, Y Liu, J Xing, W Wang, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we abstract the process of people hearing speech extracting meaningful cues
and creating various dynamically audio-consistent talking faces termed Listening and …

被引用次数：11 相关文章所有 5 个版本

[PDF] thecvf.com

Revisiting generalizability in deepfake detection: Improving metrics and stabilizing transfer

S Kamat, S Agarwal, T Darrell… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract" Generalizability" is seen as the hallmark quality of a good deepfake detection
model. However, standard out-of-domain evaluation datasets are very similar in form to the …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Moda: Mapping-once audio-driven portrait animation with dual attentions

Y Liu, L Lin, F Yu, C Zhou, Y Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Audio-driven portrait animation aims to synthesize portrait videos that are conditioned by
given audio. Animating high-fidelity and multimodal video portraits has a variety of …

被引用次数：19 相关文章所有 5 个版本

[PDF] ieee.org

Face generation and editing with stylegan: A survey

A Melnik, M Miasayedzenkau… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Our goal with this survey is to provide an overview of the state of the art deep learning
methods for face generation and editing using StyleGAN. The survey covers the evolution of …

被引用次数：33 相关文章所有 8 个版本

[PDF] arxiv.org

Multimodal-driven talking face generation via a unified diffusion-based generator

C Xu, S Zhu, J Zhu, T Huang, J Zhang, Y Tai… - arXiv preprint arXiv …, 2023 - arxiv.org

Multimodal-driven talking face generation refers to animating a portrait with the given pose,
expression, and gaze transferred from the driving image and video, or estimated from the …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Make your actor talk: Generalizable and high-fidelity lip sync with motion and appearance disentanglement

R Yu, T He, A Zhang, Y Wang, J Guo, X Tan… - arXiv preprint arXiv …, 2024 - arxiv.org

We aim to edit the lip movements in talking video according to the given speech while
preserving the personal identity and visual details. The task can be decomposed into two …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Dagan++: Depth-aware generative adversarial network for talking head video generation

FT Hong, L Shen, D Xu - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Predominant techniques on talking head generation largely depend on 2D information,
including facial appearances and motions from input face images. Nevertheless, dense 3D …

被引用次数：9 相关文章所有 6 个版本

高级搜索

QQ 群