作者
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, Björn Schuller
发表日期
2020/4/1
期刊
IEEE Transactions on Affective Computing
出版商
IEEE
简介
Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In this article, we propose a solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e. g., speaker classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our framework, which …
引用总数
20192020202120222023202421222333811
学术搜索中的文章
S Latif, R Rana, S Khalifa, R Jurdak, J Epps… - IEEE Transactions on Affective computing, 2020
R Rana, S Latif, S Khalifa, R Jurdak, J Epps - arXiv preprint arXiv:1907.06078, 2019