Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org
Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

[PDF][PDF] Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.

H Ming, DY Huang, L Xie, J Wu, M Dong, H Li - Interspeech, 2016 - isca-archive.org
Emotional voice conversion aims at converting speech from one emotion state to another.
This paper proposes to model timbre and prosody features using a deep bidirectional long …

Hierarchical representation and estimation of prosody using continuous wavelet transform

A Suni, J Šimko, D Aalto, M Vainio - Computer Speech & Language, 2017 - Elsevier
Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

Group sparse representation with wavenet vocoder adaptation for spectrum and prosody conversion

B Sisman, M Zhang, H Li - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
The statistical approach to voice conversion typically consists of a feature conversion
module followed by a vocoder. So far, the feature conversion studies are mainly focused on …

An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer

C Fu, C Liu, CT Ishi, H Ishiguro - Speech Communication, 2022 - Elsevier
Emotional voice conversion (EVC) is a task that converts an utterance's emotional features
into a target one while retaining semantic information and speaker identity. Recently, some …

Transformation of prosody in voice conversion

B Şişman, H Li, KC Tan - 2017 Asia-Pacific Signal and …, 2017 - ieeexplore.ieee.org
Voice Conversion (VC) aims to convert one's voice to sound like that of another. So far, most
of the voice conversion frameworks mainly focus only on the conversion of spectrum. We …

Emotional voice conversion using dual supervised adversarial networks with continuous wavelet transform f0 features

Z Luo, J Chen, T Takiguchi… - IEEE/ACM Transactions on …, 2019 - ieeexplore.ieee.org
In emotional voice conversion (VC) tasks, it is difficult to deal with a simple representation of
fundamental frequency (F0), which is the most important feature in emotional voice …

A comparative study of fundamental frequency stability between speech and singing

BR De Medeiros, JP Cabral, AR Meireles… - Speech …, 2021 - Elsevier
Speaking and singing are mechanisms of vocal production that have distinct articulatory
properties and consequently produce sounds that are normally perceived as different …

Fundamental frequency modeling using wavelets for emotional voice conversion

H Ming, D Huang, M Dong, H Li, L Xie… - 2015 International …, 2015 - ieeexplore.ieee.org
This paper is to show a representation of fundamental frequency (F0) using continuous
wavelet transform (CWT) for prosody modeling in emotion conversion. Emotional conversion …