[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

An overview of end-to-end automatic speech recognition

D Wang, X Wang, S Lv - Symmetry, 2019 - mdpi.com
Automatic speech recognition, especially large vocabulary continuous speech recognition,
is an important issue in the field of machine learning. For a long time, the hidden Markov …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

[PDF][PDF] Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU

A Shewalkar, D Nyavanandi, SA Ludwig - Journal of Artificial …, 2019 - sciendo.com
Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden
layers. DNNs are becoming popular in automatic speech recognition tasks which combines …

Neuro-symbolic speech understanding in aircraft maintenance metaverse

A Siyaev, GS Jo - Ieee Access, 2021 - ieeexplore.ieee.org
In the emerging world of metaverses, it is essential for speech communication systems to be
aware of context to interact with virtual assets in the 3D world. This paper proposes the …

Hybrid CTC/attention architecture for end-to-end speech recognition

S Watanabe, T Hori, S Kim, JR Hershey… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
Conventional automatic speech recognition (ASR) based on a hidden Markov model
(HMM)/deep neural network (DNN) is a very complicated system consisting of various …

[PDF][PDF] Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration

T Nakatani - proc. INTERSPEECH, 2019 - isca-archive.org
The state-of-the-art neural network architecture named Transformer has been used
successfully for many sequence-tosequence transformation tasks. The advantage of this …

A comparison of transformer and lstm encoder decoder models for asr

A Zeyer, P Bahar, K Irie, R Schlüter… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
We present competitive results using a Transformer encoder-decoder-attention model for
end-to-end speech recognition needing less training time compared to a similarly …