Generative adversarial networks in human emotion synthesis: A review

N Hajarolasvadi, MA Ramirez, W Beccaro… - IEEE …, 2020 - ieeexplore.ieee.org
Deep generative models have become an emerging topic in various research areas like
computer vision and signal processing. These models allow synthesizing realistic data …

Deep learning serves voice cloning: how vulnerable are automatic speaker verification systems to spoofing trials?

P Partila, J Tovarek, GH Ilk, J Rozhon… - IEEE Communications …, 2020 - ieeexplore.ieee.org
This article verifies the reliability of automatic speaker verification (ASV) systems on new
synthesis methods based on deep neural networks. ASV systems are widely used and …

Casting to corpus: Segmenting and selecting spontaneous dialogue for TTS with a CNN-LSTM speaker-dependent breath detector

É Székely, GE Henter… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This paper considers utilising breaths to create improved spontaneous-speech corpora for
conversational text-to-speech from found audio recordings such as dialogue podcasts …

[PDF][PDF] Speaker recognition-assisted robust audio deepfake detection.

J Pan, S Nie, H Zhang, S He, K Zhang, S Liang… - Interspeech, 2022 - isca-archive.org
Audio deepfake detection is usually formulated as a binary classification between genuine
and fake speech for an entire utterance. Environmental clues such as background and …

Vaw-gan for singing voice conversion with non-parallel training data

J Lu, K Zhou, B Sisman, H Li - 2020 Asia-Pacific Signal and …, 2020 - ieeexplore.ieee.org
Singing voice conversion aims to convert singer's voice from source to target without
changing singing content. Parallel training data is typically required for the training of …

Applications of deep learning to audio generation

Y Zhao, X Xia, R Togneri - IEEE Circuits and Systems …, 2019 - ieeexplore.ieee.org
In the recent past years, deep learning based machine learning systems have demonstrated
remarkable success for a wide range of learning tasks in multiple domains such as computer …

Noise tokens: Learning neural noise templates for environment-aware speech enhancement

H Li, J Yamagishi - arXiv preprint arXiv:2004.04001, 2020 - arxiv.org
In recent years, speech enhancement (SE) has achieved impressive progress with the
success of deep neural networks (DNNs). However, the DNN approach usually fails to …

[PDF][PDF] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder.

K Chen, B Chen, J Lai, K Yu - Interspeech, 2018 - isca-archive.org
Waveform generator is a key component in voice conversion. Recently, WaveNet waveform
generator conditioned on the Mel-cepstrum (Mcep) has shown better quality over standard …

[HTML][HTML] Reconstruction of Iberian ceramic potteries using generative adversarial networks

P Navarro, C Cintas, M Lucena, JM Fuertes… - Scientific reports, 2022 - nature.com
Several aspects of past culture, including historical trends, are inferred from time-based
patterns observed in archaeological artifacts belonging to different periods. The presence …

[HTML][HTML] Manipulating voice attributes by adversarial learning of structured disentangled representations

L Benaroya, N Obin, A Roebel - Entropy, 2023 - mdpi.com
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate
part of its content, primarily its identity, while maintaining the rest unchanged. Research in …