Multichannel CNN-BLSTM architecture for speech emotion recognition system by fusion of magnitude and phase spectral features using DCCA for consumer …

GA Prabhakar, B Basel, A Dutta… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
GA Prabhakar, B Basel, A Dutta, CVR Rao
IEEE Transactions on Consumer Electronics, 2023ieeexplore.ieee.org
Conventional Speech Emotion Recognition (SER) approaches put more emphasis on
extracting magnitude spectrum-based features, such as Mel Frequency Cepstral Coefficients
(MFCCs), and Mel spectrogram. However, phase information is ignored due to signal
processing difficulties such as the phase wrapping issue. This work develops a multichannel
Convolution Neural Network-Bidirectional Long Short Term Memory (CNN-BLSTM)
architectures with an attention mechanism for speaker-independent SER by considering …
Conventional Speech Emotion Recognition (SER) approaches put more emphasis on extracting magnitude spectrum-based features, such as Mel Frequency Cepstral Coefficients (MFCCs), and Mel spectrogram. However, phase information is ignored due to signal processing difficulties such as the phase wrapping issue. This work develops a multichannel Convolution Neural Network-Bidirectional Long Short Term Memory (CNN-BLSTM) architectures with an attention mechanism for speaker-independent SER by considering phase and magnitude spectrum-based features. The phase-based features are extracted using the Modified Group Delay Function (MODGD). The obtained phase features are combined with MFCC features. The CNN-BLSTM network extract learned representation from magnitude and phase features. The learned representation from MFCCs and MODGD are combined and given as an input to the Support Vector Machine (SVM) for classification. The Deep Canonical Correlation Analysis (DCCA) is used to maximize the correlation between magnitude and phase information to improve the conventional SER system’s performance. Here the IEMOCAP database is used for performance analysis. The experimental results show improvement over MFCC features and existing approaches for unimodal SER. In this work, we also developed real-time Web server application for the proposed architecture.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果