Unsupervised training and directed manual transcription for LVCSR

K Yu, M Gales, L Wang, PC Woodland - Speech Communication, 2010 - Elsevier
A significant cost in obtaining acoustic training data is the generation of accurate
transcriptions. When no transcription is available, unsupervised training techniques must be …

Improving interpretability and regularization in deep learning

C Wu, MJF Gales, A Ragni… - … /ACM Transactions on …, 2017 - ieeexplore.ieee.org
Deep learning approaches yield state-of-the-art performance in a range of tasks, including
automatic speech recognition. However, the highly distributed representation in a deep …

Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …

An information-extraction approach to speech processing: Analysis, detection, verification, and recognition

CH Lee, SM Siniscalchi - Proceedings of the IEEE, 2013 - ieeexplore.ieee.org
The field of automatic speech recognition (ASR) has enjoyed more than 30 years of
technology advances due to the extensive utilization of the hidden Markov model (HMM) …

HMMs and related speech recognition technologies

S Young - Springer handbook of speech processing, 2008 - Springer
Almost all present-day continuous speech recognition (CSR) systems are based on hidden
Markov models (HMMs). Although the fundamentals of HMM-based CSR have been …

[PDF][PDF] Semi-supervised maximum mutual information training of deep neural network acoustic models.

V Manohar, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org
Abstract Maximum Mutual Information (MMI) is a popular discriminative criterion that has
been used in supervised training of acoustic models for automatic speech recognition …

Improving rare word recognition with lm-aware mwer training

W Wang, T Chen, TN Sainath, E Variani… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E)
models on words rarely seen during training, when used in either the shallow fusion or the …

[DOC][DOC] Deep learning for signal and information processing

L Deng, D Yu - Microsoft research monograph, 2013 - microsoft.com
This short monograph contains the material expanded from two tutorials that the authors
gave, one at APSIPA in October 2011 and the other at ICASSP in March 2012. Substantial …

Prosodic feature-based discriminatively trained low resource speech recognition system

T Hasija, V Kadyan, K Guleria, A Alharbi, H Alyami… - Sustainability, 2022 - mdpi.com
Speech recognition has been an active field of research in the last few decades since it
facilitates better human–computer interaction. Native language automatic speech …

Analysis of MLP-based hierarchical phoneme posterior probability estimator

J Pinto, S Garimella, M Magimai-Doss… - … on Audio, Speech …, 2010 - ieeexplore.ieee.org
We analyze a simple hierarchical architecture consisting of two multilayer perceptron (MLP)
classifiers in tandem to estimate the phonetic class conditional probabilities. In this …