Wavelets for intonation modeling in HMM speech synthesis

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier

Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

被引用次数：352 相关文章所有 6 个版本

[PDF] arxiv.org

Fastspeech 2: Fast and high-quality end-to-end text to speech

Y Ren, C Hu, X Tan, T Qin, S Zhao, Z Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org

Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize
speech significantly faster than previous autoregressive models with comparable quality …

被引用次数：1572 相关文章所有 3 个版本

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

被引用次数：177 相关文章所有 7 个版本

[PDF] ieee.org

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

被引用次数：92 相关文章所有 4 个版本

[PDF] arxiv.org

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org

Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

被引用次数：88 相关文章所有 7 个版本

[PDF] openreview.net

From speaker to dubber: movie dubbing with prosody and duration consistency learning

Z Zhang, L Li, G Cong, H Yin, Y Gao, C Yan… - Proceedings of the …, 2024 - dl.acm.org

Movie Dubbing aims to convert scripts into speeches that align with the given movie clip in
both temporal and emotional aspects while preserving the vocal timbre of one brief …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Converting anyone's emotion: Towards speaker-independent emotional voice conversion

K Zhou, B Sisman, M Zhang, H Li - arXiv preprint arXiv:2005.07025, 2020 - arxiv.org

Emotional voice conversion aims to convert the emotion of speech from one state to another
while preserving the linguistic content and speaker identity. The prior studies on emotional …

被引用次数：66 相关文章所有 10 个版本

[PDF] isca-archive.org

[PDF][PDF] Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion.

H Ming, DY Huang, L Xie, J Wu, M Dong, H Li - Interspeech, 2016 - isca-archive.org

Emotional voice conversion aims at converting speech from one emotion state to another.
This paper proposes to model timbre and prosody features using a deep bidirectional long …

被引用次数：95 相关文章所有 6 个版本

[PDF] helsinki.fi

Hierarchical representation and estimation of prosody using continuous wavelet transform

A Suni, J Šimko, D Aalto, M Vainio - Computer Speech & Language, 2017 - Elsevier

Prominences and boundaries are the essential constituents of prosodic structure in speech.
They provide for means to chunk the speech stream into linguistically relevant units by …

被引用次数：82 相关文章所有 11 个版本

Fusion of spectral and prosody modelling for multilingual speech emotion conversion

S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier

The paper proposes an integrated speech emotion conversion framework developed using
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …

被引用次数：24 相关文章所有 2 个版本

高级搜索

QQ 群