NT Pham, DNM Dang, BNH Pham… - Proceedings of the 2023 …, 2023 - dl.acm.org
This paper proposes a multi-modal approach for speech emotion recognition (SER) using both text and audio inputs. The audio embedding is extracted by using a vision-based …
Recent research has shown that multi-modal learning is a successful method for enhancing classification performance by mixing several forms of input, notably in speech-emotion …
NT Pham, LT Phan, DNM Dang… - Proceedings of the 12th …, 2023 - dl.acm.org
Speech emotion recognition (SER) is a crucial aspect of affective computing and human- computer interaction, yet effectively identifying emotions in different speakers and languages …