Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

Overview of voice conversion methods based on deep learning

T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

Live speech portraits: real-time photorealistic talking-head animation

Y Lu, J Chai, X Cao - ACM Transactions on Graphics (ToG), 2021 - dl.acm.org
To the best of our knowledge, we first present a live system that generates personalized
photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system …

Deep learning for text style transfer: A survey

D Jin, Z Jin, Z Hu, O Vechtomova… - Computational …, 2022 - direct.mit.edu
Text style transfer is an important task in natural language generation, which aims to control
certain attributes in the generated text, such as politeness, emotion, humor, and many …

Neural analysis and synthesis: Reconstructing speech from self-supervised representations

HS Choi, J Lee, W Kim, J Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a neural analysis and synthesis (NANSY) framework that can manipulate the
voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have …

Makelttalk: speaker-aware talking-head animation

Y Zhou, X Han, E Shechtman, J Echevarria… - ACM Transactions On …, 2020 - dl.acm.org
We present a method that generates expressive talking-head videos from a single facial
image with audio as the only input. In contrast to previous attempts to learn direct mappings …

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press
Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arXiv preprint arXiv …, 2020 - arxiv.org
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

Dfa-nerf: Personalized talking head generation via disentangled face attributes neural rendering

S Yao, RZ Zhong, Y Yan, G Zhai, X Yang - arXiv preprint arXiv:2201.00791, 2022 - arxiv.org
While recent advances in deep neural networks have made it possible to render high-quality
images, generating photo-realistic and personalized talking head remains challenging. With …