Liaison and pronunciation learning in end-to-end text-to-speech in French

J Taylor, S Le Maguer, K Richmond - The 11th ISCA Speech …, 2021 - research.ed.ac.uk
Abstract Sequence-to-sequence (S2S) TTS models like Tacotron have grapheme-only
inputs when trained fully end-to-end. Grapheme inputs map to phone sounds depending on …

[PDF][PDF] Can Prosody Transfer Embeddings be Used for Prosody Assessment?

M Juliao, A Abad, H Moniz - Proc. Speech Prosody 2022, 2022 - isca-archive.org
In voice conversion, it is possible to transfer some characteristic components of a (target)
speech utterance, such as the content, pitch, or speaker identity, from the corresponding …

Believe in the Sound You See: The Effects of Body Type and Voice Pitch on the Perceived Audio-Visual Correspondence and Believability of Virtual Characters

L Lam - 2023 - hammer.purdue.edu
Lam, Luchcha. MS, Purdue University, May 2023. Believe in the Sound You See: The Effects
of Body Type and Voice Pitch on the Perceived Audio-Visual Correspondence and …

[PDF][PDF] INVESTIGATING EMOTION EMBEDDING BASED TEXT-TO-SPEECH MODELS UNDER LIMITED TRAINING DATA

H PIJPELINK - arno.uvt.nl
This thesis studies the effect of limiting training data on emotion embedding models for Text-
to-Speech (TTS) systems. In order to reproduce natural human prosody, TTS models use …