Espresso: A fast end-to-end neural speech recognition toolkit

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：440 相关文章所有 7 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：747 相关文章所有 5 个版本

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

被引用次数：176 相关文章所有 6 个版本

[PDF] arxiv.org

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

被引用次数：215 相关文章所有 7 个版本

[PDF] arxiv.org

Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition

Y Wang, J Li, H Wang, Y Qian… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …

被引用次数：70 相关文章所有 5 个版本

[PDF] researchgate.net

[PDF][PDF] End-to-end arabic speech recognition: A review

AA Abdelhamid, HA Alsayadi, I Hegazy… - Proceedings of the …, 2020 - researchgate.net

Automatic speech recognition (ASR) is a crucial field of science due to its massive
applications that can be developed to help humans to improve their daily life tasks. Despite …

被引用次数：30 相关文章所有 6 个版本

[PDF] arxiv.org

The 2020 espnet update: new features, broadened applications, performance improvements, and future plans

S Watanabe, F Boyer, X Chang, P Guo… - 2021 IEEE Data …, 2021 - ieeexplore.ieee.org

This paper describes the recent development of ESPnet (https://github. com/espnet/espnet),
an end-to-end speech processing toolkit. This project was initiated in December 2017 to …

被引用次数：57 相关文章所有 7 个版本

[PDF] wiley.com Full View

Arabic speech recognition using end‐to‐end deep learning

HA Alsayadi, AA Abdelhamid, I Hegazy… - IET Signal …, 2021 - Wiley Online Library

Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be
integrated with other systems better than Arabic ASR methods without diacritics. In this work …

被引用次数：58 相关文章所有 8 个版本

[PDF] mlr.press

Efficient sequence transduction by jointly predicting tokens and durations

H Xu, F Jia, S Majumdar, H Huang… - International …, 2023 - proceedings.mlr.press

This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for
sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by …

被引用次数：12 相关文章所有 8 个版本

[PDF] arxiv.org

Wake word detection with streaming transformers

Y Wang, H Lv, D Povey, L Xie… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Modern wake word detection systems usually rely on neural networks for acoustic modeling.
Transformers has recently shown superior performance over LSTM and convolutional …

被引用次数：43 相关文章所有 9 个版本

高级搜索

QQ 群