Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech

J Kim, J Kong, J Son - International Conference on Machine …, 2021 - proceedings.mlr.press
Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and
parallel sampling have been proposed, but their sample quality does not match that of two …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Naturalspeech: End-to-end text-to-speech synthesis with human-level quality

X Tan, J Chen, H Liu, J Cong, C Zhang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Text-to-speech (TTS) has made rapid progress in both academia and industry in recent
years. Some questions naturally arise that whether a TTS system can achieve human-level …

Fastspeech 2: Fast and high-quality end-to-end text to speech

Y Ren, C Hu, X Tan, T Qin, S Zhao, Z Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org
Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize
speech significantly faster than previous autoregressive models with comparable quality …

Fastspeech: Fast, robust and controllable text to speech

Y Ren, Y Ruan, X Tan, T Qin, S Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc
Neural network based end-to-end text to speech (TTS) has significantly improved the quality
of synthesized speech. Prominent methods (eg, Tacotron 2) usually first generate mel …

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org
This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

Activation functions: Comparison of trends in practice and research for deep learning

C Nwankpa, W Ijomah, A Gachagan… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep neural networks have been successfully used in diverse emerging domains to solve
real world complex problems with may more deep learning (DL) architectures, being …