Predicting speech intelligibility with deep neural networks

C Spille, SD Ewert, B Kollmeier, BT Meyer - Computer Speech & Language, 2018 - Elsevier
An accurate objective prediction of human speech intelligibility is of interest for many
applications such as the evaluation of signal processing algorithms. To predict the speech …

Multistream CNN for robust acoustic modeling

KJ Han, J Pan, VKN Tadala, T Ma… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
This paper proposes multistream CNN, a novel neural network architecture for robust
acoustic modeling in speech recognition tasks. The proposed architecture processes input …

MSTRE-Net: Multistreaming acoustic modeling for automatic lyrics transcription

E Demirel, S Ahlbäck, S Dixon - arXiv preprint arXiv:2108.02625, 2021 - arxiv.org
This paper makes several contributions to automatic lyrics transcription (ALT) research. Our
main contribution is a novel variant of the Multistreaming Time-Delay Neural Network …

Coding and decoding of messages in human speech communication: Implications for machine recognition of speech

H Hermansky - Speech Communication, 2019 - Elsevier
This paper postulates that linguistic message in speech is coded redundantly in both the
time and the frequency domains. Such redundant coding of the message in the signal …

A cross-task transfer learning approach to adapting deep speech enhancement models to unseen background noise using paired senone classifiers

S Wang, W Li, SM Siniscalchi… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We propose an environment adaptation approach that improves deep speech enhancement
models via minimizing the Kullback-Leibler divergence between posterior probabilities …

Novel neural network based fusion for multistream ASR

SH Mallidi, H Hermansky - 2016 IEEE International Conference …, 2016 - ieeexplore.ieee.org
Robustness of automatic speech recognition (ASR) to acoustic mismatches can be improved
by multistream framework. Frequently used approach to combine decisions from individual …

Uncertainty estimation of DNN classifiers

SH Mallidi, T Ogawa… - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org
New efficient measures for estimating uncertainty of deep neural network (DNN) classifiers
are proposed and successfully applied to multistream-based unsupervised adaptation of …

M-vectors: sub-band based energy modulation features for multi-stream automatic speech recognition

S Sadhu, R Li, H Hermansky - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
In this paper, we propose a novel method to capture energy modulations from different
frequency bands in speech into frame-level feature vectors, called Modulation-vectors or M …

[PDF][PDF] Stream Attention for Distributed Multi-Microphone Speech Recognition.

X Wang, R Li, H Hermansky - Interspeech, 2018 - isca-archive.org
Exploiting multiple microphones has been a widely-used strategy for robust automatic
speech recognition (ASR). Particularly, in a general hands-free scenario, acquisition of …

Single-ended speech quality prediction based on automatic speech recognition

R Huber, J Ooster, BT Meyer - Journal of the Audio Engineering Society, 2018 - aes.org
Quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the
required quality of telecommunication service. Although formal subjective listening tests still …