End-to-end text-to-speech for low-resource languages by cross-lingual transfer learning

T Tu, YJ Chen, C Yeh, HY Lee - arXiv preprint arXiv:1904.06508, 2019 - arxiv.org
End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text
plus speech data. However, laborious data collection remains difficult for at least 95% of the …

Transfer learning framework for low-resource text-to-speech using a large-scale unlabeled speech corpus

M Kim, M Jeong, BJ Choi, S Ahn, JY Lee… - arXiv preprint arXiv …, 2022 - arxiv.org
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus,
which is troublesome to collect. In this paper, we propose a transfer learning framework for …

Hierarchical transfer learning for multilingual, multi-speaker, and style transfer DNN-based TTS on low-resource languages

K Azizah, M Adriani, W Jatmiko - IEEE Access, 2020 - ieeexplore.ieee.org
This work applies a hierarchical transfer learning to implement deep neural network (DNN)-
based multilingual text-to-speech (TTS) for low-resource languages. DNN-based system …

Unsupervised learning for sequence-to-sequence text-to-speech for low-resource languages

H Zhang, Y Lin - arXiv preprint arXiv:2008.04549, 2020 - arxiv.org
Recently, sequence-to-sequence models with attention have been successfully applied in
Text-to-speech (TTS). These models can generate near-human speech with a large …

SANE-TTS: stable and natural end-to-end multilingual text-to-speech

H Cho, W Jung, J Lee, SH Woo - arXiv preprint arXiv:2206.12132, 2022 - arxiv.org
In this paper, we present SANE-TTS, a stable and natural end-to-end multilingual TTS
model. By the difficulty of obtaining multilingual corpus for given speaker, training …

Text-to-speech for low-resource agglutinative language with morphology-aware language model pre-training

R Liu, Y Hu, H Zuo, Z Luo, L Wang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the
development of deep learning, encoder-decoder based TTS models perform superior …

Unsupervised polyglot text-to-speech

E Nachmani, L Wolf - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
We present a TTS neural network that is able to produce speech in multiple languages. The
proposed network is able to transfer a voice, which was presented as a sample in a source …

[PDF][PDF] Towards Universal Text-to-Speech.

J Yang, L He - Interspeech, 2020 - interspeech2020.org
This paper studies a multilingual sequence-to-sequence textto-speech framework towards
universal modeling, that is able to synthesize speech for any speaker in any language using …

Building a mixed-lingual neural TTS system with only monolingual data

L Xue, W Song, G Xu, L Xie, Z Wu - arXiv preprint arXiv:1904.06063, 2019 - arxiv.org
When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the
challenges is to synthesize Chinese utterances with English phrases or words embedded …

End-to-end code-switched tts with mix of monolingual recordings

Y Cao, X Wu, S Liu, J Yu, X Li, Z Wu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
State-of-the-art text-to-speech (TTS) synthesis models can produce monolingual speech
with high intelligibility and naturalness. However, when the models are applied to synthesize …