查看文章

Performance analysis of various training targets for improving speech quality and intelligibility

作者

Shoba Sivapatham, Asutosh Kar, Rajavel Ramadoss

发表日期

2021/4/1

期刊

Applied Acoustics

卷号

175

页码范围

107817

出版商

Elsevier

简介

Denoising a single-channel speech (recorded using one microphone) remains an open problem in many speech-related applications. Recently, supervised deep learning methods are used to denoise the speech signal. This work uses Deep Neural Network (DNN) to learn the Time–Frequency (T-F) mask of the clean speech from its noisy speech features. In general, Ideal Binary Mask (IBM) is used as the binary mask training target to improve speech intelligibility, and Ideal Ratio Mask (IRM) is used as a non-binary mask training target to improve speech quality. Still, it may not necessarily be the best T-F mask to analyze the performance of improvement in speech quality/intelligibility. However, an appropriate training target remains to be unclear for supervised deep learning methods. In this work, a non-binary novel soft T-F mask named Optimum Soft Mask (OSM) is proposed, analyzed and compared with different T …

引用总数

被引用次数：12

2021202220233 4 5

学术搜索中的文章

Performance analysis of various training targets for improving speech quality and intelligibility

S Sivapatham, A Kar, R Ramadoss - Applied Acoustics, 2021

被引用次数：12 相关文章