作者
Muhammad Sajjad, Soonil Kwon
发表日期
2020/4/27
期刊
IEEE access
卷号
8
页码范围
79861-79875
出版商
IEEE
简介
Emotional state recognition of a speaker is a difficult task for machine learning algorithms which plays an important role in the field of speech emotion recognition (SER). SER plays a significant role in many real-time applications such as human behavior assessment, human-robot interaction, virtual reality, and emergency centers to analyze the emotional state of speakers. Previous research in this field is mostly focused on handcrafted features and traditional convolutional neural network (CNN) models used to extract high-level features from speech spectrograms to increase the recognition accuracy and overall model cost complexity. In contrast, we introduce a novel framework for SER using a key sequence segment selection based on redial based function network (RBFN) similarity measurement in clusters. The selected sequence is converted into a spectrogram by applying the STFT algorithm and passed into …
引用总数