作者
Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D Plumbley
发表日期
2017/5/14
研讨会论文
2017 International Joint Conference on Neural Networks (IJCNN)
页码范围
3461-3466
出版商
IEEE
简介
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method …
引用总数
2017201820192020202120222023202452221211617121
学术搜索中的文章
Y Xu, Q Kong, Q Huang, W Wang, MD Plumbley - 2017 International Joint Conference on Neural …, 2017