查看文章

utdallas.edu 中的 [PDF]

Mixed emotion modelling for emotional voice conversion

作者

Kun Zhou, Berrak Sisman, Carlos Busso, Haizhou Li

发表日期

2024

期刊

Speaker Odyssey

卷号

页码范围

简介

Emotional voice conversion (EVC) aims to convert the emotional state of an utterance from one emotion to another while preserving the linguistic content and speaker identity. Current studies mostly focus on modelling the conversion between several specific emotion types. Synthesizing mixed effects of emotions could help us to better imitate human emotions, and facilitate more natural humancomputer interaction. In this research, for the first time, we formulate and study the research problem of mixed emotion synthesis for EVC. We regard emotional styles as a series of emotion attributes that are learnt from a ranking-based support vector machine (SVM). Each attribute measures the degree of the relevance between the speech recordings belonging to different emotion types. We then incorporate those attributes into a sequence-to-sequence (seq2seq) emotional voice conversion framework. During the training, the framework not only learns to characterize the input emotional style, but also quantifies its relevance with other emotion types. At runtime, various emotional mixtures can be produced by manually defining the attributes. We conduct objective and subjective evaluations to validate our idea in terms of mixed emotion synthesis. We further build an emotion triangle 1 as an application of emotion transition. Codes and speech samples are publicly available 2.

引用总数

被引用次数：4

2022202320241 1 2

学术搜索中的文章

Mixed emotion modelling for emotional voice conversion

K Zhou, B Sisman, C Busso, H Li - computer, 2022

被引用次数：4 相关文章