Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and …
Transformers have recently dominated the ASR field. Although able to yield good performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) …
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to …
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to …
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC). Our formulation relaxes the …
F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …
While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up …