查看文章

frontiersin.org 中的 [HTML]

Attention-based speech feature transfer between speakers

作者

Hangbok Lee, Minjae Cho, Hyuk-Yoon Kwon

发表日期

2024/2/26

期刊

Frontiers in Artificial Intelligence

卷号

页码范围

1259641

出版商

Frontiers Media SA

简介

In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.

学术搜索中的文章

Attention-based speech feature transfer between speakers

H Lee, M Cho, HY Kwon - Frontiers in Artificial Intelligence, 2024