[PDF][PDF] Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program

P Wagner, J Beskow, S Betz, J Edlund… - Proceedings of the 10th …, 2019 - core.ac.uk
Speech synthesis applications have become an ubiquity, in navigation systems, digital
assistants or as screen or audio book readers. Despite their impact on the acceptability of …

[HTML][HTML] Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning

HA Ahmad, TA Rashid - Journal of King Saud University-Computer and …, 2024 - Elsevier
Synthesis of Text-to-speech (TTS) is a process that involves translating a natural language
text into a speech. Speech synthesisers face a major challenge when recognizing the …

Choice of voices: A large-scale evaluation of text-to-speech voice quality for long-form content

J Cambre, J Colnago, J Maddock, J Tsai… - Proceedings of the 2020 …, 2020 - dl.acm.org
The advancement of text-to-speech (TTS) voices and a rise of commercial TTS platforms
allow people to easily experience TTS voices across a variety of technologies, applications …

Evaluating and personalizing user-perceived quality of text-to-speech voices for delivering mindfulness meditation with different physical embodiments

Z Shi, H Chen, AM Velentza, S Liu, N Dennler… - Proceedings of the …, 2023 - dl.acm.org
Mindfulness-based therapies have been shown to be effective in improving mental health,
and technology-based methods have the potential to expand the accessibility of these …

Dynamic prosody generation for speech synthesis using linguistics-driven acoustic embedding selection

S Tyagi, M Nicolis, J Rohnke, T Drugman… - arXiv preprint arXiv …, 2019 - arxiv.org
Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-
human capabilities when considering isolated sentences. But something which is still …

Camp: a two-stage approach to modelling prosody in context

Z Hodari, A Moinet, S Karlapati… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Prosody is an integral part of communication, but remains an open problem in state-of-the-
art speech synthesis. There are two major issues faced when modelling prosody:(1) prosody …

Multi-granularity annotation of instantaneous intelligibility of learners' utterances based on shadowing techniques

C Zhu, R Hakoda, D Saito, N Minematsu… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
The practical goal of pronunciation training is to acquire an intelligible enough
pronunciation, not a native-like pronunciation. In our studies [1],[2], we proposed a method …

Building and designing expressive speech synthesis

MP Aylett, L Clark, BR Cowan, I Torre - … Agents: 20 years of Research on …, 2021 - dl.acm.org
We know there is something special about speech. Our voices are not just a means of
communicating. They also give a deep impression of who we are and what we might know …

Measuring speech recognition with a matrix test using synthetic speech

T Nuesse, B Wiercinski, T Brand… - Trends in Hearing, 2019 - journals.sagepub.com
Speech audiometry is an essential part of audiological diagnostics and clinical
measurements. Development times of speech recognition tests are rather long, depending …

[PDF][PDF] Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing.

T Kunihara, C Zhu, D Saito, N Minematsu… - …, 2022 - isca-archive.org
In language learners' speech, mispronounced words, word fragments, repairs, filled pauses,
etc are often found, and they can be detected with ASR-based CALL systems. When …