Sentence-level control vectors for deep neural network speech synthesis

O Watts, Z Wu, S King - … 2015 16th Annual Conference of the …, 2015 - research.ed.ac.uk
This paper describes the use of a low-dimensional vector representation of sentence
acoustics to control the output of a feed-forward deep neural network text-to-speech system …

[PDF][PDF] Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

W Nakata, T Koriyama, S Takamichi, Y Saito, Y Ijima… - Interspeech, 2022 - isca-archive.org
We propose a speech-synthesis model for predicting appropriate voice styles on the basis of
the character-annotated text for audiobook speech synthesis. An audiobook is more …

[PDF][PDF] Expressive speech synthesis in MARY TTS using audiobook data and emotionML.

M Charfuelan, I Steiner - INTERSPEECH, 2013 - isca-archive.org
This paper describes a framework for synthesis of expressive speech based on MARY TTS
and Emotion Markup Language (EmotionML). We describe the creation of expressive unit …

ALISA: An automatic lightly supervised speech segmentation and alignment tool

A Stan, Y Mamiya, J Yamagishi, P Bell, O Watts… - Computer Speech & …, 2016 - Elsevier
This paper describes the ALISA tool, which implements a lightly supervised method for
sentence-level alignment of speech with imperfect transcripts. Its intended use is to enable …

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

S Takamichi, W Nakata, N Tanji… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we construct a Japanese audiobook speech corpus called" J-MAC" for speech
synthesis research. With the success of reading-style speech synthesis, the research target …

[PDF][PDF] Using Audio Books for Training a Text-to-Speech System.

A Chalamandaris, P Tsiakoulis, S Karabetsos, S Raptis - LREC, 2014 - lrec-conf.org
Creating new voices for a TTS system often requires a costly procedure of designing and
recording an audio corpus, a time consuming and effort intensive task. Using publicly …

Narrative aesthetic absorption in audiobooks is predicted by blink rate and acoustic features.

EB Lange, D Thiele, MM Kuijpers - Psychology of Aesthetics …, 2022 - psycnet.apa.org
Narrative aesthetic absorption describes a state in which we focus on the story world of a
narrative while becoming less aware of our surroundings and ourselves. It is characterized …

The role of prosody and voice quality in indirect storytelling speech: A cross-narrator perspective in four European languages

R Montaño, F Alías - Speech Communication, 2017 - Elsevier
During the last decades, the majority of works devoted on expressive speech acoustic
analysis have focused on emotions, although there is a growing interest in other speaking …

Unsupervised learning for expressive speech synthesis

I Jauk - 2017 - upcommons.upc.edu
Nowadays, especially with the upswing of neural networks, speech synthesis is almost
totally data driven. The goal of this thesis is to provide methods for automatic and …

[PDF][PDF] Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data.

A Stan, P Bell, J Yamagishi, S King - INTERSPEECH, 2013 - isca-archive.org
This paper introduces a method for lightly supervised discriminative training using MMI to
improve the alignment of speech and text data for use in training HMM-based TTS systems …