作者
Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki
发表日期
2018/5/1
期刊
Acoustical Science and Technology
卷号
39
期号
3
页码范围
234-242
出版商
Acoustical Society of Japan
简介
This paper investigates the importance of temporal cues in the perception of speaker individuality and vocal emotion. Experiments of speaker and vocal-emotion recognition were carried out using an analysis/synthesis method of noise-vocoded speech (NVS). The temporal resolution of NVS was controlled by varying the upper limit of modulation frequency (0, 0.5, 1, 2, 4, 8, 16, 32, and 64Hz). In addition, the role of temporal cue in the different spectral resolution condition was also investigated by varying the number of channels (4, 8, and 16). The results demonstrated that temporal resolution contributes to the recognition of both speaker and vocal emotion. Therefore, temporal cues are found to be important for the perception of not only linguistic information but also speaker individuality and vocal emotion. On the other hand, the performance of speaker recognition was less sensitive to the spectral resolution, at least in the limited set of stimuli in the present study. For vocalemotion recognition, the spectral resolution was shown to be important for recognizing only neutral, joy, and cold anger, but not sadness or hot anger. The important modulation frequency band for the perception of nonlinguistic information was suggested to be higher than that of linguistic information.
引用总数
201820192020202120222023202415113632