Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation

Y Ohtani, T Toda, H Saruwatari, K Shikano - 2006 - naist.repo.nii.ac.jp
2006naist.repo.nii.ac.jp
The performance of voice conversion has been considerably improved through statistical
modeling of spectral sequences. However, the converted speech still contains traces of
artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as
well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a
framework of the voice conversion based on a Gaussian Mixture Model (GMM) on joint
probability density of source and target features. We convert both spectral and source …
The performance of voice conversion has been considerably improved through statistical modeling of spectral sequences. However, the converted speech still contains traces of artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a framework of the voice conversion based on a Gaussian Mixture Model (GMM) on joint probability density of source and target features. We convert both spectral and source feature sequences based on Maximum Likelihood Estimation (MLE). Objective and subjective evaluation results demonstrate that the proposed source conversion produces strong improvements in both the converted speech quality and the conversion accuracy for speaker individuality.
naist.repo.nii.ac.jp
以上显示的是最相近的搜索结果。 查看全部搜索结果