We propose a two-stage emotion-controllable text-to-speech (TTS) model that can increase the diversity of intra-emotion variation and also preserve inter-emotion controllability in …
AS Suni, J Simko, MT Vainio - Speech prosody, 2016 - researchportal.helsinki.fi
Unsupervised boundary detection and classification is both a theoretically interesting question and an important challenge for speech technology. Theoretical interest lies in …
A Suni, D Aalto, M Vainio - arXiv preprint arXiv:1510.01949, 2015 - arxiv.org
Prominences and boundaries are the essential constituents of prosodic structure in speech. They provide for means to chunk the speech stream into linguistically relevant units by …
The motivation for this exploratory research project is twofold. First, there is the contrast between the importance of certain phenomena in English spoken communication on the one …
M Vainio - Statistical Language and Speech Processing: Second …, 2014 - Springer
Text-to-speech synthesis is a task that solves many real-world problems such as providing speaking and reading ability to people who lack those capabilities. It is thus viewed mainly …
A Dannenberg, S Werner, M Vainio - Proceedings of the …, 2016 - isca-archive.org
In this paper we examine prosodic and syntactic structures of spontaneous English speech. By wavelet-based analysis, the prosodic structure of speech can be visually represented as …
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it …
K Hirose - 2016 IEEE 13th International Conference on Signal …, 2016 - ieeexplore.ieee.org
Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in …
Text-to-speech (TTS) models aim to synthesize human-like speech including linguistic and paralinguistic information. Current TTS models [1, 2] can synthesize understandable speech …