End-to-end generation of talking faces from noisy speech

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：207 相关文章所有 6 个版本

[PDF] arxiv.org

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

被引用次数：41 相关文章所有 9 个版本

[PDF] arxiv.org

Speech driven talking face generation from a single image and an emotion condition

SE Eskimez, Y Zhang, Z Duan - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Visual emotion expression plays an important role in audiovisual speech communication. In
this work, we propose a novel approach to rendering visual emotion expression in speech …

被引用次数：91 相关文章所有 6 个版本

[HTML] sciencedirect.com

[HTML][HTML] Talking human face generation: A survey

M Toshpulatov, W Lee, S Lee - Expert Systems with Applications, 2023 - Elsevier

Talking human face generation aims at synthesizing a natural human face that talks in
correspondence to the given text or audio series. Implementing the recently developed …

被引用次数：24 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Speech driven video editing via an audio-conditioned diffusion model

D Bigioi, S Basak, M Stypułkowski, M Zieba… - Image and Vision …, 2024 - Elsevier

Taking inspiration from recent developments in visual generative tasks using diffusion
models, we propose a method for end-to-end speech-driven video editing using a denoising …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Deep person generation: A survey from the perspective of face, pose, and cloth synthesis

T Sha, W Zhang, T Shen, Z Li, T Mei - ACM Computing Surveys, 2023 - dl.acm.org

Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shopping, and art/movie …

被引用次数：40 相关文章所有 3 个版本

Expression-tailored talking face generation with adaptive cross-modal weighting

D Zeng, S Zhao, J Zhang, H Liu, K Li - Neurocomputing, 2022 - Elsevier

The key of talking face generation is to synthesize the identity-preserving natural facial
expressions with accurate audio-lip synchronization. To accomplish this, it requires to …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Talking head generation with audio and speech related facial action units

S Chen, Z Liu, J Liu, Z Yan, L Wang - arXiv preprint arXiv:2110.09951, 2021 - arxiv.org

The task of talking head generation is to synthesize a lip synchronized talking head video by
inputting an arbitrary face image and audio clips. Most existing methods ignore the local …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Speech2video: Cross-modal distillation for speech to video generation

S Si, J Wang, X Qu, N Cheng, W Wei, X Zhu… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper investigates a novel task of talking face video generation solely from speeches.
The speech-to-video generation technique can spark interesting applications in …

被引用次数：18 相关文章所有 6 个版本

Talking face generation via facial anatomy

S Liu, H Wang - ACM Transactions on Multimedia Computing …, 2023 - dl.acm.org

To generate the corresponding talking face from a speech audio and a face image, it is
essential to match the variations in the facial appearance with the speech audio in subtle …

被引用次数：11 相关文章

高级搜索

QQ 群