查看文章

arxiv.org 中的 [PDF]

Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization

作者

Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D Plumbley

发表日期

2020/8/12

期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing

卷号

页码范围

2450-2460

出版商

IEEE

简介

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN). Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize …

引用总数

被引用次数：129

202020212022202320247 25 49 40 7

学术搜索中的文章

Sound event detection of weakly labelled data with cnn-transformer and automatic threshold optimization

Q Kong, Y Xu, W Wang, MD Plumbley - IEEE/ACM Transactions on Audio, Speech, and …, 2020

被引用次数：129 相关文章所有 8 个版本