State-of-the-art speech recognition with sequence-to-sequence models

CC Chiu, TN Sainath, Y Wu… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS),
subsume the acoustic, pronunciation and language model components of a traditional …

Streaming automatic speech recognition with the transformer model

N Moritz, T Hori, J Le - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …

An online attention-based model for speech recognition

R Fan, P Zhou, W Chen, J Jia, G Liu - arXiv preprint arXiv:1811.05247, 2018 - arxiv.org
Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the
whole pipeline of traditional automatic speech recognition (ASR) systems and become …

Multi-dialect speech recognition with a single sequence-to-sequence model

B Li, TN Sainath, KC Sim, M Bacchiani… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Sequence-to-sequence models provide a simple and elegant solution for building speech
recognition systems by folding separate components of a typical system, namely acoustic …

A comparison of techniques for language model integration in encoder-decoder speech recognition

S Toshniwal, A Kannan, CC Chiu, Y Wu… - 2018 IEEE spoken …, 2018 - ieeexplore.ieee.org
Attention-based recurrent neural encoder-decoder models present an elegant solution to the
automatic speech recognition problem. This approach folds the acoustic model …

Attention-based end-to-end speech recognition on voice search

C Shan, J Zhang, Y Wang, L Xie - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Recently, there has been a growing interest in end-to-end speech recognition that directly
transcribes speech to text without any predefined alignments. In this paper, we explore the …

Building competitive direct acoustics-to-word models for english conversational speech recognition

K Audhkhasi, B Kingsbury… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received
increasing attention compared to conventional subword based automatic speech …

Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss

Q Zhang, H Lu, H Sak, A Tripathi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this paper we present an end-to-end speech recognition model with Transformer
encoders that can be used in a streaming speech recognition system. Transformer …

Minimum latency training strategies for streaming sequence-to-sequence ASR

H Inaguma, Y Gaur, L Lu, J Li… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Recently, a few novel streaming attention-based sequence-to-sequence (S2S) models have
been proposed to perform online speech recognition with linear-time decoding complexity …

A spelling correction model for end-to-end speech recognition

J Guo, TN Sainath, RJ Weiss - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Attention-based sequence-to-sequence models for speech recognition jointly train an
acoustic model, language model (LM), and alignment mechanism using a single neural …