On mean absolute error for deep neural network based vector-to-vector regression

J Qi, J Du, SM Siniscalchi, X Ma… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for
the deep neural network (DNN) based vector-to-vector regression. The goal of this work is …

Generative adversarial network-based postfilter for statistical parametric speech synthesis

T Kaneko, H Kameoka, N Hojo, Y Ijima… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
We propose a postfilter based on a generative adversarial network (GAN) to compensate for
the differences between natural speech and speech synthesized by statistical parametric …

[PDF][PDF] Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.

T Kaneko, H Kameoka, K Hiramatsu, K Kashino - Interspeech, 2017 - kecl.ntt.co.jp
We propose a training framework for sequence-to-sequence voice conversion (SVC). A well-
known problem regarding a conventional VC framework is that acoustic-feature sequences …

Generative adversarial network-based postfilter for STFT spectrograms

T Kaneko, S Takaki, H Kameoka… - Interspeech 2017, 2017 - research.ed.ac.uk
We propose a learning-based postfilter to reconstruct the high-fidelity spectral texture in
short-term Fourier transform (STFT) spectrograms. In speech-processing systems, such as …

Unsupervised representation disentanglement using cross domain features and adversarial learning in variational autoencoder based voice conversion

WC Huang, H Luo, HT Hwang, CC Lo… - … on Emerging Topics …, 2020 - ieeexplore.ieee.org
An effective approach for voice conversion (VC) is to disentangle linguistic content from
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …

[PDF][PDF] Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors.

R Doddipatla, N Braunschweiler, R Maia - Interspeech, 2017 - isca-archive.org
The paper presents a mechanism to perform speaker adaptation in speech synthesis based
on deep neural networks (DNNs). The mechanism extracts speaker identification vectors …

Voice conversion based on cross-domain features using variational auto encoders

WC Huang, HT Hwang, YH Peng… - … on Chinese Spoken …, 2018 - ieeexplore.ieee.org
An effective approach to non-parallel voice conversion (VC) is to utilize deep neural
networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure …

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

K Inoue, S Hara, M Abe, N Hojo, Y Ijima - Speech Communication, 2021 - Elsevier
This paper proposes architectures that facilitate the extrapolation of emotional expressions
in deep neural network (DNN)-based text-to-speech (TTS). In this study, the meaning of …

[PDF][PDF] Perception optimized deep denoising autoencoders for speech enhancement.

PG Shivakumar, PG Georgiou - Interspeech, 2016 - isca-archive.org
Speech Enhancement is a challenging and important area of research due to the many
applications that depend on improved signal quality. It is a pre-processing step of speech …

Sentence-level control vectors for deep neural network speech synthesis

O Watts, Z Wu, S King - … 2015 16th Annual Conference of the …, 2015 - research.ed.ac.uk
This paper describes the use of a low-dimensional vector representation of sentence
acoustics to control the output of a feed-forward deep neural network text-to-speech system …