The MARA corpus: Expressivity in end-to-end TTS systems using synthesised speech data

A Stan, B Lőrincz, M Nuțu… - … Conference on Speech …, 2021 - ieeexplore.ieee.org
2021 International Conference on Speech Technology and Human …, 2021ieeexplore.ieee.org
This paper introduces the MARA corpus, a large expressive Romanian speech corpus
containing over 11 hours of high-quality data recorded by a professional female speaker.
The data is orthographically transcribed, manually segmented at utterance level and semi-
automatically aligned at phone-level. The associated text is processed by a complete
linguistic feature extractor composed of: text normalisation, phonetic transcription,
syllabification, lexical stress assignment, lemma extraction, part-of-speech tagging, chunking …
This paper introduces the MARA corpus, a large expressive Romanian speech corpus containing over 11 hours of high-quality data recorded by a professional female speaker. The data is orthographically transcribed, manually segmented at utterance level and semi-automatically aligned at phone-level. The associated text is processed by a complete linguistic feature extractor composed of: text normalisation, phonetic transcription, syllabification, lexical stress assignment, lemma extraction, part-of-speech tagging, chunking and dependency parsing.Using the MARA corpus, we evaluate the use of synthesised speech as training data in end-to-end speech synthesis systems. The synthesised data copies the original phone duration and F 0 patterns of the most expressive utterances from MARA. Five systems with different sets of expressive data are trained. The objective and subjective results show that the low quality of the synthesised speech data is averaged out by the synthesis network, and that no statistically significant differences are found between the systems’ expressivity and naturalness evaluations.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果