查看文章

arxiv.org 中的 [PDF]

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

作者

Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

发表日期

2021/10/20

研讨会论文

Proc. Interspeech 2022

页码范围

2603-2607

简介

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional style for different speakers. Inspired by the recent success of speaker disentanglement with variational autoencoder (VAE), we propose an any-to-any expressive voice conversion framework, that is called StyleVC. StyleVC is designed to disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of style encoder to model emotional style explicitly. At run-time, StyleVC converts both speaker identity and emotional style for arbitrary speakers. Experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations.

引用总数

被引用次数：24

2022202320242 12 10

学术搜索中的文章

Disentanglement of emotional style and speaker identity for expressive voice conversion

Z Du, B Sisman, K Zhou, H Li - arXiv preprint arXiv:2110.10326, 2021

被引用次数：22 相关文章所有 5 个版本

Identity conversion for emotional speakers: A study for disentanglement of emotion style and speaker identity*

Z Du, B Sisman, K Zhou, H Li - arXiv preprint arXiv:2110.10326, 2021

被引用次数：3 相关文章