A study on data augmentation of reverberant speech for robust speech recognition

T Ko, V Peddinti, D Povey, ML Seltzer… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
The environmental robustness of DNN-based acoustic models can be significantly improved
by using multi-condition training data. However, as data collection is a costly proposition …

Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

E Vincent, S Watanabe, AA Nugraha, J Barker… - Computer Speech & …, 2017 - Elsevier
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in
matched (or multi-condition) settings where the acoustic conditions of the training data …

[PDF][PDF] Improved MVDR beamforming using single-channel mask prediction networks.

H Erdogan, JR Hershey, S Watanabe, MI Mandel… - Interspeech, 2016 - isca-archive.org
Recent studies on multi-microphone speech databases indicate that it is beneficial to
perform beamforming to improve speech recognition accuracies, especially when there is a …

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

K Kinoshita, M Delcroix, S Gannot, EA P. Habets… - EURASIP Journal on …, 2016 - Springer
In recent years, substantial progress has been made in the field of reverberant speech
signal processing, including both single-and multichannel dereverberation techniques and …

Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

K Žmolíková, M Delcroix, K Kinoshita… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
The processing of speech corrupted by interfering overlapping speakers is one of the
challenging problems with regards to today's automatic speech recognition systems …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations

Z Li, Y Wu, J Liu, Y Chen, B Yuan - Proceedings of the 2020 ACM …, 2020 - dl.acm.org
Existing efforts in audio adversarial attacks only focus on the scenarios where an adversary
has prior knowledge of the entire speech input so as to generate an adversarial example by …

Robust audio adversarial example for a physical attack

H Yakura, J Sakuma - arXiv preprint arXiv:1810.11793, 2018 - arxiv.org
We propose a method to generate audio adversarial examples that can attack a state-of-the-
art speech recognition model in the physical world. Previous work assumes that generated …

An exploration of self-supervised pretrained representations for end-to-end speech recognition

X Chang, T Maekaku, P Guo, J Shi… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity
representation of the speech signal is learned from a lot of untranscribed data and shows …