作者
Dingkang Yang, Shuai Huang, Yang Liu, Lihua Zhang
发表日期
2022/9/29
期刊
IEEE Signal Processing Letters
卷号
29
页码范围
2093-2097
出版商
IEEE
简介
Speech emotion recognition combining linguistic content and audio signals in the dialog is a challenging task. Nevertheless, previous approaches have failed to explore emotion cues in contextual interactions and ignored the long-range dependencies between elements from different modalities. To tackle the above issues, this letter proposes a multimodal speech emotion recognition method using audio and text data. We first present a contextual transformer module to introduce contextual information via embedding the previous utterances between interlocutors, which enhances the emotion representation of the current utterance. Then, the proposed cross-modal transformer module focuses on the interactions between text and audio modalities, adaptively promoting the fusion from one modality to another. Furthermore, we construct associative topological relation over mini-batch and learn the association between …
引用总数
学术搜索中的文章