[HTML][HTML] Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition

NT Pham, DNM Dang, ND Nguyen, TT Nguyen… - Expert Systems with …, 2023 - Elsevier
Recently, speech emotion recognition (SER) has become an active research area in speech
processing, particularly with the advent of deep learning (DL). Numerous DL-based methods …

SERVER: Multi-modal speech emotion recognition using transformer-based and vision-based embeddings

NT Pham, DNM Dang, BNH Pham… - Proceedings of the 2023 …, 2023 - dl.acm.org
This paper proposes a multi-modal approach for speech emotion recognition (SER) using
both text and audio inputs. The audio embedding is extracted by using a vision-based …

Multi-modal speech emotion recognition: Improving accuracy through fusion of vggish and bert features with multi-head attention

PN Tran, TDT Vu, DNM Dang, NT Pham… - … Conference on Industrial …, 2023 - Springer
Recent research has shown that multi-modal learning is a successful method for enhancing
classification performance by mixing several forms of input, notably in speech-emotion …

SER-Fuse: An Emotion Recognition Application Utilizing Multi-Modal, Multi-Lingual, and Multi-Feature Fusion

NT Pham, LT Phan, DNM Dang… - Proceedings of the 12th …, 2023 - dl.acm.org
Speech emotion recognition (SER) is a crucial aspect of affective computing and human-
computer interaction, yet effectively identifying emotions in different speakers and languages …