[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Confidence estimation for attention-based sequence-to-sequence models for speech recognition

Q Li, D Qiu, Y Zhang, B Li, Y He… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
For various speech-related tasks, confidence scores from a speech recogniser are a useful
measure to assess the quality of transcriptions. In traditional hidden Markov model-based …

Conformer based elderly speech recognition system for Alzheimer's disease detection

T Wang, J Deng, M Geng, Z Ye, S Hu, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay
further progression. This paper presents the development of a state-of-the-art Conformer …

Online hybrid CTC/attention end-to-end automatic speech recognition architecture

H Miao, G Cheng, P Zhang… - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org
Recently, there has been increasing progress in end-to-end automatic speech recognition
(ASR) architecture, which transcribes speech to text without any pre-trained alignments. One …

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

[HTML][HTML] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring

Q Li, C Zhang, PC Woodland - Speech Communication, 2023 - Elsevier
The traditional hybrid deep neural network (DNN)–hidden Markov model (HMM) system and
attention-based encoder–decoder (AED) model are both commonly used automatic speech …

Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …

Massive End-to-End Models for Short Search Queries

W Wang, R Prabhavalkar, D Hwang, Q Li… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we investigate two popular end-to-end automatic speech recognition (ASR)
models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN …

Residual energy-based models for end-to-end speech recognition

Q Li, Y Zhang, B Li, L Cao, PC Woodland - arXiv preprint arXiv …, 2021 - arxiv.org
End-to-end models with auto-regressive decoders have shown impressive results for
automatic speech recognition (ASR). These models formulate the sequence-level probability …

Improving fast-slow encoder based transducer with streaming deliberation

K Li, J Mahadeokar, J Guo, Y Shi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper introduces a fast-slow encoder based transducer with streaming deliberation for
end-to-end automatic speech recognition. We aim to improve the recognition accuracy of the …