[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

Relaxing the conditional independence assumption of CTC-based ASR by conditioning on intermediate predictions

J Nozaki, T Komatsu - arXiv preprint arXiv:2104.02724, 2021 - arxiv.org
This paper proposes a method to relax the conditional independence assumption of
connectionist temporal classification (CTC)-based automatic speech recognition (ASR) …

A comparative study on non-autoregressive modelings for speech-to-text generation

Y Higuchi, N Chen, Y Fujita, H Inaguma… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …

The 2020 espnet update: new features, broadened applications, performance improvements, and future plans

S Watanabe, F Boyer, X Chang, P Guo… - 2021 IEEE Data …, 2021 - ieeexplore.ieee.org
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …

BERT meets CTC: New formulation of end-to-end speech recognition with pre-trained masked language model

Y Higuchi, B Yan, S Arora, T Ogawa… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that
adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the …

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

K Deng, Z Yang, S Watanabe, Y Higuchi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
While Transformers have achieved promising results in end-to-end (E2E) automatic speech
recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up …

Transformer 在语音识别任务中的研究现状与展望.

张晓旭, 马志强, 刘志强, 朱方圆… - Journal of Frontiers of …, 2021 - search.ebscohost.com
Transformer 作为一种新的深度学习算法框架, 得到了越来越多研究人员的关注,
成为目前的研究热点. Transformer 模型中的自注意力机制受人类只关注于重要事物的启发 …