作者
Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D Plumbley
发表日期
2020/8/12
期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
卷号
28
页码范围
2450-2460
出版商
IEEE
简介
Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN). Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize …
引用总数
2020202120222023202472549407
学术搜索中的文章