Very deep convolutional networks for end-to-end speech recognition

Y Zhang, W Chan, N Jaitly - … conference on acoustics, speech …, 2017 - ieeexplore.ieee.org
… We explored very deep CNNs for end-to-end speech recognition. We applied Network-in-Network
principles to add depth and nonlinearities to hierarchical RNNs. We also applied …

On the comparison of popular end-to-end models for large scale speech recognition

J Li, Y Wu, Y Gaur, C Wang, R Zhao, S Liu - arXiv preprint arXiv …, 2020 - arxiv.org
… Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E)
models for automatic speech recognition. Currently, there are three promising E2E methods: …

Scaling up online speech recognition using convnets

V Pratap, Q Xu, J Kahn, G Avidov… - arXiv preprint arXiv …, 2020 - arxiv.org
… We design an online end-to-end speech recognition system based on Time-Depth Separable
(TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the …

Towards end-to-end speech recognition with deep convolutional neural networks

Y Zhang, M Pezeshki, P Brakel, S Zhang… - arXiv preprint arXiv …, 2017 - arxiv.org
… To the best of our knowledge, all end-to-end neural speech recognition systems employ
recurrent neural networks in at least some part of the processing pipeline. The most successful …

Deep context: end-to-end contextual speech recognition

G Pundak, TN Sainath, R Prabhavalkar… - 2018 IEEE spoken …, 2018 - ieeexplore.ieee.org
… In this work, we present a novel, all-neural, end-to-end (E2E) ASR system that utilizes such
context. Our approach, which we refer to as Contextual Listen, Attend and Spell (CLAS) jointly…

An overview of end-to-end automatic speech recognition

D Wang, X Wang, S Lv - Symmetry, 2019 - mdpi.com
Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings
of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; …

Very deep self-attention networks for end-to-end speech recognition

NQ Pham, TS Nguyen, J Niehues, M Müller… - arXiv preprint arXiv …, 2019 - arxiv.org
… , and we show that a competitive end-to-end ASR model can be achieved solely using …
end-to-end ASR models with the Transformer. Second, in order to facilitate training of very deep

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arXiv preprint arXiv …, 2018 - arxiv.org
… Although their implementation was followed as closely as possible, training end-toend
quickly exceeded the memory limitations of modern GPUs. To work around these problems, the …

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - … Acoustics, Speech …, 2019 - ieeexplore.ieee.org
… on challenging tasks such as voice search. In … end-to-end (E2E) models [10, 11, 12, 13,
14]. Such models replace the traditional components of an ASR system with a single, end-to-end

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
… Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition