Continuous wavelet transform for analysis of speech prosody

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

被引用次数：92 相关文章所有 4 个版本

[PDF] arxiv.org

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org

Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

被引用次数：88 相关文章所有 7 个版本

[PDF] isca-archive.org

[PDF][PDF] Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.

H Ming, DY Huang, L Xie, J Wu, M Dong, H Li - Interspeech, 2016 - isca-archive.org

Emotional voice conversion aims at converting speech from one emotion state to another.
This paper proposes to model timbre and prosody features using a deep bidirectional long …

被引用次数：95 相关文章所有 6 个版本

[PDF] helsinki.fi

Hierarchical representation and estimation of prosody using continuous wavelet transform

A Suni, J Šimko, D Aalto, M Vainio - Computer Speech & Language, 2017 - Elsevier

Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

被引用次数：82 相关文章所有 11 个版本

[PDF] researchgate.net

Group sparse representation with wavenet vocoder adaptation for spectrum and prosody conversion

B Sisman, M Zhang, H Li - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org

The statistical approach to voice conversion typically consists of a feature conversion
module followed by a vocoder. So far, the feature conversion studies are mainly focused on …

被引用次数：54 相关文章所有 3 个版本

[PDF] sciencedirect.com

An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer

C Fu, C Liu, CT Ishi, H Ishiguro - Speech Communication, 2022 - Elsevier

Emotional voice conversion (EVC) is a task that converts an utterance's emotional features
into a target one while retaining semantic information and speaker identity. Recently, some …

被引用次数：9 相关文章所有 4 个版本

[PDF] apsipa.org

Transformation of prosody in voice conversion

B Şişman, H Li, KC Tan - 2017 Asia-Pacific Signal and …, 2017 - ieeexplore.ieee.org

Voice Conversion (VC) aims to convert one's voice to sound like that of another. So far, most
of the voice conversion frameworks mainly focus only on the conversion of spectrum. We …

被引用次数：40 相关文章所有 5 个版本

Emotional voice conversion using dual supervised adversarial networks with continuous wavelet transform f0 features

Z Luo, J Chen, T Takiguchi… - IEEE/ACM Transactions on …, 2019 - ieeexplore.ieee.org

In emotional voice conversion (VC) tasks, it is difficult to deal with a simple representation of
fundamental frequency (F0), which is the most important feature in emotional voice …

被引用次数：32 相关文章所有 3 个版本

A comparative study of fundamental frequency stability between speech and singing

BR De Medeiros, JP Cabral, AR Meireles… - Speech …, 2021 - Elsevier

Speaking and singing are mechanisms of vocal production that have distinct articulatory
properties and consequently produce sounds that are normally perceived as different …

被引用次数：20 相关文章所有 3 个版本

[PDF] researchgate.net

Fundamental frequency modeling using wavelets for emotional voice conversion

H Ming, D Huang, M Dong, H Li, L Xie… - 2015 International …, 2015 - ieeexplore.ieee.org

This paper is to show a representation of fundamental frequency (F0) using continuous
wavelet transform (CWT) for prosody modeling in emotion conversion. Emotional conversion …

被引用次数：42 相关文章所有 6 个版本

高级搜索

QQ 群