Hierarchical representation and estimation of prosody using continuous wavelet transform

A Suni, J Šimko, D Aalto, M Vainio - Computer Speech & Language, 2017 - Elsevier
Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

[PDF][PDF] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence

X Luo, S Takamichi, Y Saito, T Koriyama… - … on Signal and …, 2024 - nowpublishers.com
We propose a two-stage emotion-controllable text-to-speech (TTS) model that can increase
the diversity of intra-emotion variation and also preserve inter-emotion controllability in …

[PDF][PDF] Boundary detection using continuous wavelet analysis

AS Suni, J Simko, MT Vainio - Speech prosody, 2016 - researchportal.helsinki.fi
Unsupervised boundary detection and classification is both a theoretically interesting
question and an important challenge for speech technology. Theoretical interest lies in …

Hierarchical representation of prosody for statistical speech synthesis

A Suni, D Aalto, M Vainio - arXiv preprint arXiv:1510.01949, 2015 - arxiv.org
Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

[PDF][PDF] Wavelet-based Visualisation of Weak Forms and Connected Speech Processes in

L Kalvodová - Sciences, 2024 - helda.helsinki.fi
The motivation for this exploratory research project is twofold. First, there is the contrast
between the importance of certain phenomena in English spoken communication on the one …

Phonetics and machine learning: hierarchical modelling of prosody in statistical speech synthesis

M Vainio - Statistical Language and Speech Processing: Second …, 2014 - Springer
Text-to-speech synthesis is a task that solves many real-world problems such as providing
speaking and reading ability to people who lack those capabilities. It is thus viewed mainly …

[PDF][PDF] Prosodic and syntactic structures in spontaneous English speech

A Dannenberg, S Werner, M Vainio - Proceedings of the …, 2016 - isca-archive.org
In this paper we examine prosodic and syntactic structures of spontaneous English speech.
By wavelet-based analysis, the prosodic structure of speech can be visually represented as …

[PDF][PDF] Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis

M Fonseca De Sam Bento Ribeiro - 2018 - core.ac.uk
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years,
especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it …

Modeling of fundamental frequency contours for HMM-based speech synthesis: Representation of fundamental frequency contours for statistical speech synthesis

K Hirose - 2016 IEEE 13th International Conference on Signal …, 2016 - ieeexplore.ieee.org
Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based
ones, gain special attention from researchers because of their ability in generating speech in …

[PDF][PDF] Emotion-controllable Speech Synthesis using Emotion Soft Label and Word-level Prominence

Text-to-speech (TTS) models aim to synthesize human-like speech including linguistic and
paralinguistic information. Current TTS models [1, 2] can synthesize understandable speech …