Ways to implement global variance in statistical speech synthesis.

J Qi, J Du, SM Siniscalchi, X Ma… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for
the deep neural network (DNN) based vector-to-vector regression. The goal of this work is …

被引用次数：276 相关文章所有 10 个版本

[PDF] ntt.co.jp

Generative adversarial network-based postfilter for statistical parametric speech synthesis

T Kaneko, H Kameoka, N Hojo, Y Ijima… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org

We propose a postfilter based on a generative adversarial network (GAN) to compensate for
the differences between natural speech and speech synthesized by statistical parametric …

被引用次数：162 相关文章所有 4 个版本

[PDF] ntt.co.jp

[PDF][PDF] Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.

T Kaneko, H Kameoka, K Hiramatsu, K Kashino - Interspeech, 2017 - kecl.ntt.co.jp

We propose a training framework for sequence-to-sequence voice conversion (SVC). A well-
known problem regarding a conventional VC framework is that acoustic-feature sequences …

被引用次数：132 相关文章所有 3 个版本

[PDF] ed.ac.uk

Generative adversarial network-based postfilter for STFT spectrograms

T Kaneko, S Takaki, H Kameoka… - Interspeech 2017, 2017 - research.ed.ac.uk

We propose a learning-based postfilter to reconstruct the high-fidelity spectral texture in
short-term Fourier transform (STFT) spectrograms. In speech-processing systems, such as …

被引用次数：78 相关文章所有 7 个版本

[PDF] ieee.org

Unsupervised representation disentanglement using cross domain features and adversarial learning in variational autoencoder based voice conversion

WC Huang, H Luo, HT Hwang, CC Lo… - … on Emerging Topics …, 2020 - ieeexplore.ieee.org

An effective approach for voice conversion (VC) is to disentangle linguistic content from
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …

被引用次数：50 相关文章所有 7 个版本

[PDF] isca-archive.org

[PDF][PDF] Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors.

R Doddipatla, N Braunschweiler, R Maia - Interspeech, 2017 - isca-archive.org

The paper presents a mechanism to perform speaker adaptation in speech synthesis based
on deep neural networks (DNNs). The mechanism extracts speaker identification vectors …

被引用次数：61 相关文章所有 6 个版本

[PDF] arxiv.org

Voice conversion based on cross-domain features using variational auto encoders

WC Huang, HT Hwang, YH Peng… - … on Chinese Spoken …, 2018 - ieeexplore.ieee.org

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural
networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure …

被引用次数：51 相关文章所有 8 个版本

[PDF] arxiv.org

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

K Inoue, S Hara, M Abe, N Hojo, Y Ijima - Speech Communication, 2021 - Elsevier

This paper proposes architectures that facilitate the extrapolation of emotional expressions
in deep neural network (DNN)-based text-to-speech (TTS). In this study, the meaning of …

被引用次数：25 相关文章所有 9 个版本

[PDF] isca-archive.org

[PDF][PDF] Perception optimized deep denoising autoencoders for speech enhancement.

PG Shivakumar, PG Georgiou - Interspeech, 2016 - isca-archive.org

Speech Enhancement is a challenging and important area of research due to the many
applications that depend on improved signal quality. It is a pre-processing step of speech …

被引用次数：55 相关文章所有 2 个版本

[PDF] ed.ac.uk

Sentence-level control vectors for deep neural network speech synthesis

O Watts, Z Wu, S King - … 2015 16th Annual Conference of the …, 2015 - research.ed.ac.uk

This paper describes the use of a low-dimensional vector representation of sentence
acoustics to control the output of a feed-forward deep neural network text-to-speech system …

被引用次数：65 相关文章所有 10 个版本

高级搜索

QQ 群