A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Speech driven talking face generation from a single image and an emotion condition

SE Eskimez, Y Zhang, Z Duan - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Visual emotion expression plays an important role in audiovisual speech communication. In
this work, we propose a novel approach to rendering visual emotion expression in speech …

[HTML][HTML] Talking human face generation: A survey

M Toshpulatov, W Lee, S Lee - Expert Systems with Applications, 2023 - Elsevier
Talking human face generation aims at synthesizing a natural human face that talks in
correspondence to the given text or audio series. Implementing the recently developed …

[HTML][HTML] Speech driven video editing via an audio-conditioned diffusion model

D Bigioi, S Basak, M Stypułkowski, M Zieba… - Image and Vision …, 2024 - Elsevier
Taking inspiration from recent developments in visual generative tasks using diffusion
models, we propose a method for end-to-end speech-driven video editing using a denoising …

Deep person generation: A survey from the perspective of face, pose, and cloth synthesis

T Sha, W Zhang, T Shen, Z Li, T Mei - ACM Computing Surveys, 2023 - dl.acm.org
Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shopping, and art/movie …

Expression-tailored talking face generation with adaptive cross-modal weighting

D Zeng, S Zhao, J Zhang, H Liu, K Li - Neurocomputing, 2022 - Elsevier
The key of talking face generation is to synthesize the identity-preserving natural facial
expressions with accurate audio-lip synchronization. To accomplish this, it requires to …

Talking head generation with audio and speech related facial action units

S Chen, Z Liu, J Liu, Z Yan, L Wang - arXiv preprint arXiv:2110.09951, 2021 - arxiv.org
The task of talking head generation is to synthesize a lip synchronized talking head video by
inputting an arbitrary face image and audio clips. Most existing methods ignore the local …

Speech2video: Cross-modal distillation for speech to video generation

S Si, J Wang, X Qu, N Cheng, W Wei, X Zhu… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper investigates a novel task of talking face video generation solely from speeches.
The speech-to-video generation technique can spark interesting applications in …

Talking face generation via facial anatomy

S Liu, H Wang - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org
To generate the corresponding talking face from a speech audio and a face image, it is
essential to match the variations in the facial appearance with the speech audio in subtle …