查看文章

arxiv.org 中的 [PDF]

Multimodal speech emotion recognition using audio and text

作者

Seunghyun Yoon, Seokhyun Byun, Kyomin Jung

发表日期

2018/12/18

研讨会论文

2018 IEEE spoken language technology workshop (SLT)

页码范围

112-118

出版商

IEEE

简介

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model …

引用总数

被引用次数：409

20192020202120222023202421 45 68 94 132 49

学术搜索中的文章

Multimodal speech emotion recognition using audio and text

S Yoon, S Byun, K Jung - 2018 IEEE spoken language technology workshop …, 2018

被引用次数：409 相关文章所有 16 个版本