查看文章

Towards an improved modeling of the glottal source in statistical parametric speech synthesis

作者

Joao P Cabral, Steve Renals, Korin Richmond, Junichi Yamagishi

发表日期

2007

简介

This paper proposes the use of the Liljencrants-Fant model (LF-model) to represent the glottal source signal in HMM-based speech synthesis systems. These systems generally use a pulse train to model the periodicity of the excitation signal of voiced speech. However, this model produces a strong and uniform harmonic structure throughout the spectrum of the excitation which makes the synthetic speech sound buzzy. The use of a mixed band excitation and phase manipulation reduces this effect but it can result in degradation of the speech quality if the noise component is not weighted carefully. In turn, the LF-waveform has a decaying spectrum at higher frequencies, which is more similar to the real glottal source excitation signal. We conducted a perceptual experiment to test the hypothesis that the LF-model can perform as well as or better than the pulse train in a HMM-based speech synthesizer. In the synthesis, we used the mean values of the LF-parameters, calculated by measurements of the recorded speech. The result of this study is important not only regarding the improvement in speech quality of these type of systems, but also because the LF-model can be used to model many characteristics of the glottal source, such as voice quality, which are important for voice transformation and generation of expressive speech.

引用总数

被引用次数：82

200620072008200920102011201220132014201520162017201820192020202120221 6 7 8 8 8 11 11 5 1 3 3 6 3 1

学术搜索中的文章

Towards an improved modeling of the glottal source in statistical parametric speech synthesis

JP Cabral, S Renals, K Richmond, J Yamagishi - 2007

被引用次数：82 相关文章所有 17 个版本