Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality …
In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development …
R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that improves the speech styling at utterance level. One of the key challenges in prosody …
K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org
Emotional voice conversion aims to convert the spectrum and prosody to change the emotional patterns of speech, while preserving the speaker identity and linguistic content …
Z Zhang, L Li, G Cong, H Yin, Y Gao, C Yan… - Proceedings of the …, 2024 - dl.acm.org
Movie Dubbing aims to convert scripts into speeches that align with the given movie clip in both temporal and emotional aspects while preserving the vocal timbre of one brief …
Emotional voice conversion aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. The prior studies on emotional …
H Ming, DY Huang, L Xie, J Wu, M Dong, H Li - Interspeech, 2016 - isca-archive.org
Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long …
Prominences and boundaries are the essential constituents of prosodic structure in speech. They provide for means to chunk the speech stream into linguistically relevant units by …
S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier
The paper proposes an integrated speech emotion conversion framework developed using speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …