作者
Shoba Sivapatham, Asutosh Kar, Rajavel Ramadoss
发表日期
2021/4/1
期刊
Applied Acoustics
卷号
175
页码范围
107817
出版商
Elsevier
简介
Denoising a single-channel speech (recorded using one microphone) remains an open problem in many speech-related applications. Recently, supervised deep learning methods are used to denoise the speech signal. This work uses Deep Neural Network (DNN) to learn the Time–Frequency (T-F) mask of the clean speech from its noisy speech features. In general, Ideal Binary Mask (IBM) is used as the binary mask training target to improve speech intelligibility, and Ideal Ratio Mask (IRM) is used as a non-binary mask training target to improve speech quality. Still, it may not necessarily be the best T-F mask to analyze the performance of improvement in speech quality/intelligibility. However, an appropriate training target remains to be unclear for supervised deep learning methods. In this work, a non-binary novel soft T-F mask named Optimum Soft Mask (OSM) is proposed, analyzed and compared with different T …
引用总数