[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

Non-autoregressive transformer for speech recognition

N Chen, S Watanabe, J Villalba… - IEEE Signal …, 2020 - ieeexplore.ieee.org
Very deep transformers outperform conventional bi-directional long short-term memory
networks for automatic speech recognition (ASR) by a significant margin. However, being …

Zero-query adversarial attack on black-box automatic speech recognition systems

Z Fang, T Wang, L Zhao, S Zhang, B Li, Y Ge… - Proceedings of the …, 2024 - dl.acm.org
In recent years, extensive research has been conducted on the vulnerability of ASR systems,
revealing that black-box adversarial example attacks pose significant threats to real-world …

Relaxing the conditional independence assumption of CTC-based ASR by conditioning on intermediate predictions

J Nozaki, T Komatsu - arXiv preprint arXiv:2104.02724, 2021 - arxiv.org
This paper proposes a method to relax the conditional independence assumption of
connectionist temporal classification (CTC)-based automatic speech recognition (ASR) …

Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT

Y Bai, J Yi, J Tao, Z Tian, Z Wen… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Attention-based encoder-decoder (AED) models have achieved promising performance in
speech recognition. However, because the decoder predicts text tokens (such as characters …

Exploring the integration of IoT and Generative AI in English language education: Smart tools for personalized learning experiences

W Dong, D Pan, S Kim - Journal of Computational Science, 2024 - Elsevier
Abstract English language education is undergoing a transformative shift, propelled by
advancements in technology. This research explores the integration of the Internet of Things …