Improving emotion classification through variational inference of latent variables

S Parthasarathy, V Rozgic, M Sun… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
ICASSP 2019-2019 IEEE International Conference on Acoustics …, 2019ieeexplore.ieee.org
Conventional models for emotion recognition from speech signal are trained in supervised
fashion using speech utterances with emotion labels. In this study we hypothesize that
speech signal depends on multiple latent variables including the emotional state, age,
gender, and speech content. We propose an Adversarial Autoencoder (AAE) to perform
variational inference over the latent variables and reconstruct the input feature
representations. Reconstruction of feature representations is used as an auxiliary task to aid …
Conventional models for emotion recognition from speech signal are trained in supervised fashion using speech utterances with emotion labels. In this study we hypothesize that speech signal depends on multiple latent variables including the emotional state, age, gender, and speech content. We propose an Adversarial Autoencoder (AAE) to perform variational inference over the latent variables and reconstruct the input feature representations. Reconstruction of feature representations is used as an auxiliary task to aid the primary emotion recognition task. Experiments on the IEMOCAP dataset demonstrate that the auxiliary learning tasks improve emotion classification accuracy compared to a baseline supervised classifier. Further, we demonstrate that the proposed learning approach can be used for the end-to-end speech emotion recognition, as its applicable for models that operate on frame-level inputs.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果