SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Deep speech: Scaling up end-to-end speech recognition

A Hannun, C Case, J Casper, B Catanzaro… - arXiv preprint arXiv …, 2014 - arxiv.org
We present a state-of-the-art speech recognition system developed using end-to-end deep
learning. Our architecture is significantly simpler than traditional speech systems, which rely …

A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Wav2letter++: A fast open-source speech recognition system

V Pratap, A Hannun, Q Xu, J Cai, J Kahn… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper introduces wav2letter++, a fast open-source deep learning speech recognition
framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for …

Fully convolutional speech recognition

N Zeghidour, Q Xu, V Liptchinsky, N Usunier… - arXiv preprint arXiv …, 2018 - arxiv.org
Current state-of-the-art speech recognition systems build on recurrent neural networks for
acoustic and/or language modeling, and rely on feature extraction pipelines to extract mel …

Deep voice: Real-time neural text-to-speech

SÖ Arık, M Chrzanowski, A Coates… - International …, 2017 - proceedings.mlr.press
Abstract We present Deep Voice, a production-quality text-to-speech system constructed
entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end …

ESPnet: End-to-end speech processing toolkit

S Watanabe, T Hori, S Karita, T Hayashi… - arXiv preprint arXiv …, 2018 - arxiv.org
This paper introduces a new open source platform for end-to-end speech processing named
ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and …

Specaugment: A simple data augmentation method for automatic speech recognition

DS Park, W Chan, Y Zhang, CC Chiu, B Zoph… - arXiv preprint arXiv …, 2019 - arxiv.org
We present SpecAugment, a simple data augmentation method for speech recognition.
SpecAugment is applied directly to the feature inputs of a neural network (ie, filter bank …

[PDF][PDF] Exploring convolutional neural network structures and optimization techniques for speech recognition.

O Abdel-Hamid, L Deng, D Yu - Interspeech, 2013 - Citeseer
Recently, convolutional neural networks (CNNs) have been shown to outperform the
standard fully connected deep neural networks within the hybrid deep neural …

Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec

Z Du, S Zhang, K Hu, S Zheng - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an
extension of the open-source speech processing toolkit FunASR. FunCodec provides …