作者
Hangbok Lee, Minjae Cho, Hyuk-Yoon Kwon
发表日期
2024/2/26
期刊
Frontiers in Artificial Intelligence
卷号
7
页码范围
1259641
出版商
Frontiers Media SA
简介
In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.
学术搜索中的文章
H Lee, M Cho, HY Kwon - Frontiers in Artificial Intelligence, 2024