[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition

Y Wang, J Li, H Wang, Y Qian… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …

[PDF][PDF] End-to-end arabic speech recognition: A review

AA Abdelhamid, HA Alsayadi, I Hegazy… - Proceedings of the …, 2020 - researchgate.net
Automatic speech recognition (ASR) is a crucial field of science due to its massive
applications that can be developed to help humans to improve their daily life tasks. Despite …

The 2020 espnet update: new features, broadened applications, performance improvements, and future plans

S Watanabe, F Boyer, X Chang, P Guo… - 2021 IEEE Data …, 2021 - ieeexplore.ieee.org
This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …

Arabic speech recognition using end‐to‐end deep learning

HA Alsayadi, AA Abdelhamid, I Hegazy… - IET Signal …, 2021 - Wiley Online Library
Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be
integrated with other systems better than Arabic ASR methods without diacritics. In this work …

Efficient sequence transduction by jointly predicting tokens and durations

H Xu, F Jia, S Majumdar, H Huang… - International …, 2023 - proceedings.mlr.press
This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for
sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by …

Wake word detection with streaming transformers

Y Wang, H Lv, D Povey, L Xie… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Modern wake word detection systems usually rely on neural networks for acoustic modeling.
Transformers has recently shown superior performance over LSTM and convolutional …