Synthesizing expressive speech from amateur audiobook recordings

O Watts, Z Wu, S King - … 2015 16th Annual Conference of the …, 2015 - research.ed.ac.uk

This paper describes the use of a low-dimensional vector representation of sentence
acoustics to control the output of a feed-forward deep neural network text-to-speech system …

被引用次数：65 相关文章所有 10 个版本

[PDF] isca-archive.org

[PDF][PDF] Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

W Nakata, T Koriyama, S Takamichi, Y Saito, Y Ijima… - Interspeech, 2022 - isca-archive.org

We propose a speech-synthesis model for predicting appropriate voice styles on the basis of
the character-annotated text for audiobook speech synthesis. An audiobook is more …

被引用次数：8 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Expressive speech synthesis in MARY TTS using audiobook data and emotionML.

M Charfuelan, I Steiner - INTERSPEECH, 2013 - isca-archive.org

This paper describes a framework for synthesis of expressive speech based on MARY TTS
and Emotion Markup Language (EmotionML). We describe the creation of expressive unit …

被引用次数：62 相关文章所有 5 个版本

[PDF] google.com

ALISA: An automatic lightly supervised speech segmentation and alignment tool

A Stan, Y Mamiya, J Yamagishi, P Bell, O Watts… - Computer Speech & …, 2016 - Elsevier

This paper describes the ALISA tool, which implements a lightly supervised method for
sentence-level alignment of speech with imperfect transcripts. Its intended use is to enable …

被引用次数：41 相关文章所有 7 个版本

[PDF] arxiv.org

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

S Takamichi, W Nakata, N Tanji… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we construct a Japanese audiobook speech corpus called" J-MAC" for speech
synthesis research. With the success of reading-style speech synthesis, the research target …

被引用次数：8 相关文章所有 8 个版本

[PDF] lrec-conf.org

[PDF][PDF] Using Audio Books for Training a Text-to-Speech System.

A Chalamandaris, P Tsiakoulis, S Karabetsos, S Raptis - LREC, 2014 - lrec-conf.org

Creating new voices for a TTS system often requires a costly procedure of designing and
recording an audio corpus, a time consuming and effort intensive task. Using publicly …

被引用次数：19 相关文章所有 3 个版本

Narrative aesthetic absorption in audiobooks is predicted by blink rate and acoustic features.

EB Lange, D Thiele, MM Kuijpers - Psychology of Aesthetics …, 2022 - psycnet.apa.org

Narrative aesthetic absorption describes a state in which we focus on the story world of a
narrative while becoming less aware of our surroundings and ourselves. It is characterized …

被引用次数：8 相关文章所有 5 个版本

The role of prosody and voice quality in indirect storytelling speech: A cross-narrator perspective in four European languages

R Montaño, F Alías - Speech Communication, 2017 - Elsevier

During the last decades, the majority of works devoted on expressive speech acoustic
analysis have focused on emotions, although there is a growing interest in other speaking …

被引用次数：12 相关文章所有 6 个版本

[PDF] upc.edu

Unsupervised learning for expressive speech synthesis

I Jauk - 2017 - upcommons.upc.edu

Nowadays, especially with the upswing of neural networks, speech synthesis is almost
totally data driven. The goal of this thesis is to provide methods for automatic and …

被引用次数：11 相关文章所有 9 个版本

[PDF] isca-archive.org

[PDF][PDF] Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data.

A Stan, P Bell, J Yamagishi, S King - INTERSPEECH, 2013 - isca-archive.org

This paper introduces a method for lightly supervised discriminative training using MMI to
improve the alignment of speech and text data for use in training HMM-based TTS systems …

被引用次数：14 相关文章所有 15 个版本

高级搜索

QQ 群