Emphasis, word prominence, and continuous wavelet transform in the control of HMM-based synthesis

A Suni, J Šimko, D Aalto, M Vainio - Computer Speech & Language, 2017 - Elsevier

Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

被引用次数：82 相关文章所有 11 个版本

[PDF] nowpublishers.com

[PDF][PDF] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence

X Luo, S Takamichi, Y Saito, T Koriyama… - … on Signal and …, 2024 - nowpublishers.com

We propose a two-stage emotion-controllable text-to-speech (TTS) model that can increase
the diversity of intra-emotion variation and also preserve inter-emotion controllability in …

[PDF][PDF] Boundary detection using continuous wavelet analysis

AS Suni, J Simko, MT Vainio - Speech prosody, 2016 - researchportal.helsinki.fi

Unsupervised boundary detection and classification is both a theoretically interesting
question and an important challenge for speech technology. Theoretical interest lies in …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Hierarchical representation of prosody for statistical speech synthesis

A Suni, D Aalto, M Vainio - arXiv preprint arXiv:1510.01949, 2015 - arxiv.org

Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

被引用次数：8 相关文章所有 4 个版本

[PDF] helsinki.fi

[PDF][PDF] Wavelet-based Visualisation of Weak Forms and Connected Speech Processes in

L Kalvodová - Sciences, 2024 - helda.helsinki.fi

The motivation for this exploratory research project is twofold. First, there is the contrast
between the importance of certain phenomena in English spoken communication on the one …

Phonetics and machine learning: hierarchical modelling of prosody in statistical speech synthesis

M Vainio - Statistical Language and Speech Processing: Second …, 2014 - Springer

Text-to-speech synthesis is a task that solves many real-world problems such as providing
speaking and reading ability to people who lack those capabilities. It is thus viewed mainly …

被引用次数：4 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] Prosodic and syntactic structures in spontaneous English speech

A Dannenberg, S Werner, M Vainio - Proceedings of the …, 2016 - isca-archive.org

In this paper we examine prosodic and syntactic structures of spontaneous English speech.
By wavelet-based analysis, the prosodic structure of speech can be visually represented as …

被引用次数：3 相关文章所有 5 个版本

[PDF] core.ac.uk

[PDF][PDF] Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis

M Fonseca De Sam Bento Ribeiro - 2018 - core.ac.uk

Statistical parametric speech synthesis (SPSS) has seen improvements over recent years,
especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it …

被引用次数：2 相关文章所有 3 个版本

[PDF] researchgate.net

Modeling of fundamental frequency contours for HMM-based speech synthesis: Representation of fundamental frequency contours for statistical speech synthesis

K Hirose - 2016 IEEE 13th International Conference on Signal …, 2016 - ieeexplore.ieee.org

Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based
ones, gain special attention from researchers because of their ability in generating speech in …

被引用次数：2 相关文章所有 2 个版本

[PDF] sython.org

[PDF][PDF] Emotion-controllable Speech Synthesis using Emotion Soft Label and Word-level Prominence

X Luo, S Takamichi, Y Saito, H Saruwatari - sython.org

Text-to-speech (TTS) models aim to synthesize human-like speech including linguistic and
paralinguistic information. Current TTS models [1, 2] can synthesize understandable speech …

高级搜索

QQ 群