State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model Under Weak Conditions

L Tian, Q Wang, B Zhang, L Bo - European Conference on Computer …, 2024 - Springer
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking
head video generation by focusing on the dynamic and nuanced relationship between audio …

Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis

J Li, J Zhang, X Bai, J Zhou… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based
architecture for talking portrait synthesis that can concurrently achieve fast convergence, real …

Generative technology for human emotion recognition: A scoping review

F Ma, Y Yuan, Y Xie, H Ren, I Liu, Y He, F Ren, FR Yu… - Information …, 2024 - Elsevier
Affective computing stands at the forefront of artificial intelligence (AI), seeking to imbue
machines with the ability to comprehend and respond to human emotions. Central to this …

Deepfake generation and detection: A benchmark and survey

G Pei, J Zhang, M Hu, Z Zhang, C Wang, Y Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Deepfake is a technology dedicated to creating highly realistic facial images and videos
under specific conditions, which has significant application potential in fields such as …

Dae-talker: High fidelity speech-driven talking face generation with diffusion autoencoder

C Du, Q Chen, T He, X Tan, X Chen, K Yu… - Proceedings of the 31st …, 2023 - dl.acm.org
While recent research has made significant progress in speech-driven talking face
generation, the quality of the generated video still lags behind that of real recordings. One …

Diffusionavatars: Deferred diffusion for high-fidelity 3d head avatars

T Kirschstein, S Giebenhain… - Proceedings of the …, 2024 - openaccess.thecvf.com
DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person offering intuitive
control over both pose and expression. We propose a diffusion-based neural renderer that …

Facetalk: Audio-driven motion diffusion for neural parametric head models

S Aneja, J Thies, A Dai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We introduce FaceTalk a novel generative approach designed for synthesizing high-fidelity
3D motion sequences of talking human heads from input audio signal. To capture the …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Hallo: Hierarchical audio-driven visual synthesis for portrait image animation

M Xu, H Li, Q Su, H Shang, L Zhang, C Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The field of portrait image animation, driven by speech audio input, has experienced
significant advancements in the generation of realistic and dynamic portraits. This research …