查看文章

arxiv.org 中的 [PDF]

Convolutional gated recurrent neural network incorporating spatial features for audio tagging

作者

Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D Plumbley

发表日期

2017/5/14

研讨会论文

2017 International Joint Conference on Neural Networks (IJCNN)

页码范围

3461-3466

出版商

IEEE

简介

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method …

引用总数

被引用次数：115

201720182019202020212022202320245 22 21 21 16 17 12 1

学术搜索中的文章

Convolutional gated recurrent neural network incorporating spatial features for audio tagging

Y Xu, Q Kong, Q Huang, W Wang, MD Plumbley - 2017 International Joint Conference on Neural …, 2017

被引用次数：115 相关文章所有 8 个版本