A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

A novel tracking deep wavelet auto-encoder method for intelligent fault diagnosis of electric locomotive bearings

S Haidong, J Hongkai, Z Ke, W Dongdong… - Mechanical Systems and …, 2018 - Elsevier
The condition monitoring of electric locomotive has attracted more and more attention due to
its significance for improving the security, reliability and automation level. In this paper, a …

Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

H Barakat, O Turk, C Demiroglu - EURASIP Journal on Audio, Speech, and …, 2024 - Springer
Speech synthesis has made significant strides thanks to the transition from machine learning
to deep learning models. Contemporary text-to-speech (TTS) models possess the capability …

Speech prosody enhances the neural processing of syntax

G Degano, PW Donhauser, L Gwilliams… - Communications …, 2024 - nature.com
Human language relies on the correct processing of syntactic information, as it is essential
for successful communication between speakers. As an abstract level of language, syntax …

Hierarchical prosody modeling for non-autoregressive speech synthesis

CM Chien, H Lee - 2021 IEEE Spoken Language Technology …, 2021 - ieeexplore.ieee.org
Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.
By explicitly providing prosody features to the TTS model, the style of synthesized utterances …

Prosody-controllable spontaneous TTS with neural HMMs

H Lameris, S Mehta, GE Henter… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Spontaneous speech has many affective and pragmatic functions that are interesting and
challenging to model in TTS. However, the presence of reduced articulation, fillers …

[HTML][HTML] Spoken Language Identification: An overview of past and present research trends

D O'Shaughnessy - Speech Communication, 2024 - Elsevier
Identification of the language used in spoken utterances is useful for multiple applications,
eg, assist in directing or automating telephone calls, or selecting which language-specific …

[HTML][HTML] Prosody and fluency of Finland Swedish as a second language: Investigating global parameters for automated speaking assessment

H Kallio, M Kautonen, M Kuronen - Speech Communication, 2023 - Elsevier
This study investigates prosody and fluency of Finland Swedish as a second language (L2).
The main objective is to investigate global measures of prosody and fluency as predictors of …

[HTML][HTML] Event-related responses reflect chunk boundaries in natural speech

I Anurova, S Vetchinnikova, A Dobrego, N Williams… - NeuroImage, 2022 - Elsevier
Chunking language has been proposed to be vital for comprehension enabling the
extraction of meaning from a continuous stream of speech. However, neurocognitive …

Predicting prosodic prominence from text with pre-trained contextualized word representations

A Talman, A Suni, H Celikkanat, S Kakouros… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper we introduce a new natural language processing dataset and benchmark for
predicting prosodic prominence from written text. To our knowledge this will be the largest …