Wild wild emotion: a multimodal ensemble approach

J Gideon, B Zhang, Z Aldeneh, Y Kim… - Proceedings of the 18th …, 2016 - dl.acm.org
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016dl.acm.org
Automatic emotion recognition from audio-visual data is a topic that has been broadly
explored using data captured in the laboratory. However, these data are not necessarily
representative of how emotion is manifested in the real-world. In this paper, we describe our
system for the 2016 Emotion Recognition in the Wild challenge. We use the Acted Facial
Expressions in the Wild database 6.0 (AFEW 6.0), which contains short clips of popular TV
shows and movies and has more variability in the data compared to laboratory recordings …
Automatic emotion recognition from audio-visual data is a topic that has been broadly explored using data captured in the laboratory. However, these data are not necessarily representative of how emotion is manifested in the real-world. In this paper, we describe our system for the 2016 Emotion Recognition in the Wild challenge. We use the Acted Facial Expressions in the Wild database 6.0 (AFEW 6.0), which contains short clips of popular TV shows and movies and has more variability in the data compared to laboratory recordings. We explore a set of features that incorporate information from facial expressions and speech, in addition to cues from the background music and overall scene. In particular, we propose the use of a feature set composed of dimensional emotion estimates trained from outside acoustic corpora. We design sets of multiclass and pairwise (one-versus-one) classifiers and fuse the resulting systems. Our fusion increases the performance from a baseline of 38.81% to 43.86% and from 40.47% to 46.88%, for validation and test sets, respectively. While the video features perform better than audio features alone, a combination of the two modalities achieves the greatest performance, with gains of 4.4% and 1.4%, with and without information gain, respectively. Because of the flexible design of the fusion, it is easily adaptable to other multimodal learning problems.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果