作者
Mustaqeem, Soonil Kwon
发表日期
2021/9
期刊
International Journal of Intelligent Systems
卷号
36
期号
9
页码范围
5116-5135
简介
Speech signal processing is an active area of research, the most dominant source of exchanging information among human beings, and the best way for human–computer interaction (HCI). Human behavior assessments and emotion recognition from a speech signal, such as speech emotion recognition (SER) is an emerging HCI area of exploration with various real time claims. The performance of an efficient SER system depends on feature learning, which include salient and discriminative information such as high‐level deep features. In this paper, we proposed a two‐stream deep convolutional neural network with an iterative neighborhood component analysis (INCA) to learn mutually spatial‐spectral features and select the most discriminative optimal features for the final prediction. Our model is composed of two channels, and each channel is associated with the convolutional neural network structure to extract …
引用总数