End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

The application of hidden Markov models in speech recognition

M Gales, S Young - Foundations and Trends® in Signal …, 2008 - nowpublishers.com
The Application of Hidden Markov Models in Speech Recognition Page 1 The Application of
Hidden Markov Models in Speech Recognition Full text available at: http://dx.doi.org/10.1561/2000000004 …

Minimum phone error and I-smoothing for improved discriminative training

D Povey, PC Woodland - 2002 IEEE international conference …, 2002 - ieeexplore.ieee.org
In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE)
criteria for the discriminative training of HMM systems. The MPE/MWE criteria are smoothed …

[图书][B] Distant speech recognition

M Wölfel, J McDonough - 2009 - books.google.com
A complete overview of distant automatic speech recognition The performance of
conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon …

[PDF][PDF] Discriminative training for large vocabulary speech recognition

D Povey - 2005 - researchgate.net
This thesis investigates the use of discriminative criteria for training HMM parameters for
speech recognition, in particular the Maximum Mutual Information (MMI) criterion and a new …

Large scale discriminative training of hidden Markov models for speech recognition

PC Woodland, D Povey - Computer Speech & Language, 2002 - Elsevier
This paper describes, and evaluates on a large scale, the lattice based framework for
discriminative training of large vocabulary speech recognition systems based on Gaussian …

Building DNN acoustic models for large vocabulary speech recognition

AL Maas, P Qi, Z Xie, AY Hannun, CT Lengerich… - Computer Speech & …, 2017 - Elsevier
Understanding architectural choices for deep neural networks (DNNs) is crucial to improving
state-of-the-art speech recognition systems. We investigate which aspects of DNN acoustic …

[PDF][PDF] Contrastive estimation: Training log-linear models on unlabeled data

NA Smith, J Eisner - Proceedings of the 43rd Annual Meeting of …, 2005 - aclanthology.org
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling
tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum …

Interacting with computers by voice: automatic speech recognition and synthesis

D O'shaughnessy - Proceedings of the IEEE, 2003 - ieeexplore.ieee.org
This paper examines how people communicate with computers using speech. Automatic
speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or …

[图书][B] Audio bandwidth extension: application of psychoacoustics, signal processing and loudspeaker design

E Larsen, RM Aarts - 2005 - books.google.com
Bandwidth extension (BWE) refers to various methods that increase either the perceived or
real frequency spectrum (bandwidth) of audio signals. Such frequency extension is …