Controlling the noise robustness of end-to-end automatic speech recognition systems

M Möller, J Twiefel, C Weber… - 2021 International Joint …, 2021 - ieeexplore.ieee.org
2021 International Joint Conference on Neural Networks (IJCNN), 2021ieeexplore.ieee.org
In this work, we propose a novel training scheme to modularize end-to-end systems. Our
training scheme aims at altering the flow of information in an end-to-end system to use the
kernels of this system for another system that fulfills another task. We apply this scheme to
extract the noise reduction capabilities from a noise-robust automatic speech recognition
(ASR) system and implement a speech enhancer from it. This enhancer receives spectral
representations from unfiltered audio and outputs cleaned spectral representations. Our …
In this work, we propose a novel training scheme to modularize end-to-end systems. Our training scheme aims at altering the flow of information in an end-to-end system to use the kernels of this system for another system that fulfills another task. We apply this scheme to extract the noise reduction capabilities from a noise-robust automatic speech recognition (ASR) system and implement a speech enhancer from it. This enhancer receives spectral representations from unfiltered audio and outputs cleaned spectral representations. Our enhancer can be integrated into an ASR system as front-end, is trainable, and reduces background noise. Our front-end uses a decoder to clean speech based on the hidden activations of the ASR system Jasper. While training, we exclusively adapt the weights in our decoder and the batch normalization in Jasper. The resulting spectral representations show less background noise. Further, areas in the spectral features are not reconstructed if they do not contribute to speech recognition. We demonstrate that our front-end can be combined with a pre-trained ASR system as back-end and supports speech recognition in noisy conditions. Further, we show that training another ASR system with our front-end results in an increased performance of the ASR system in noisy as well as noiseless conditions. The ASR system's performance is especially improved on challenging speech datasets.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果