A DNN-based accurate masking using significant feature sets

S Sivapatham, P Goel, S Burra… - … Conference on ICT …, 2022 - ieeexplore.ieee.org
S Sivapatham, P Goel, S Burra, P Sooraksa, A Kar
2022 20th International Conference on ICT and Knowledge …, 2022ieeexplore.ieee.org
Monaural speech separation has remained a very challenging problem for a longtime which
can be addressed using a supervised learning approach that uses features of the noisy
input to predict an accurate time-frequency mask. Effective acoustic phonetic features can
help in the accurate mask prediction at low Signal-to-Noise Ratios (SNRs). Individual
features capture specific attributes of the audio signal; therefore, it's essential to employ a set
of features. This work examines different combinations of monaural features as input and …
Monaural speech separation has remained a very challenging problem for a longtime which can be addressed using a supervised learning approach that uses features of the noisy input to predict an accurate time-frequency mask. Effective acoustic phonetic features can help in the accurate mask prediction at low Signal-to-Noise Ratios (SNRs). Individual features capture specific attributes of the audio signal; therefore, it’s essential to employ a set of features. This work examines different combinations of monaural features as input and ideal ratio mask a straining target to the DNN model. Feature combination sets are constructed by examining single features and then combining the most relevant ones. The results are evaluated for different feature combinations under non-stationary noises at low SNR levels. The feature performance is evaluated by using intelligibility and quality measures. A combination of two features is considered the best feature combination as it indicates a significant increase in speech intelligibility as compared to individual features and combinations consisting of more than two features.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果