SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

Librimix: An open-source dataset for generalizable speech separation

J Cosentino, M Pariente, S Cornell, A Deleforge… - arXiv preprint arXiv …, 2020 - arxiv.org
In recent years, wsj0-2mix has become the reference dataset for single-channel speech
separation. Most deep learning-based speech separation models today are benchmarked …

Investigating self-supervised learning for speech enhancement and separation

Z Huang, S Watanabe, S Yang, P García… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement and separation are two fundamental tasks for robust speech
processing. Speech enhancement suppresses background noise while speech separation …

How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR

K Iwamoto, T Ochiai, M Delcroix, R Ikeshita… - arXiv preprint arXiv …, 2022 - arxiv.org
It is challenging to improve automatic speech recognition (ASR) performance in noisy
conditions with single-channel speech enhancement (SE). In this paper, we investigate the …

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

An efficient encoder-decoder architecture with top-down attention for speech separation

K Li, R Yang, X Hu - arXiv preprint arXiv:2209.15200, 2022 - arxiv.org
Deep neural networks have shown excellent prospects in speech separation tasks.
However, obtaining good results while keeping a low model complexity remains challenging …

Speech separation using an asynchronous fully recurrent convolutional neural network

X Hu, K Li, W Zhang, Y Luo… - Advances in …, 2021 - proceedings.neurips.cc
Recent advances in the design of neural network architectures, in particular those
specialized in modeling sequences, have provided significant improvements in speech …

Espnet2-tts: Extending the edge of tts research

T Hayashi, R Yamamoto, T Yoshimura, P Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …