Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence- to-sequence learning. RNNs, however, are inherently sequential models that do not allow …
YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to …
In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked …
Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation …
K Iwamoto, T Ochiai, M Delcroix, R Ikeshita… - arXiv preprint arXiv …, 2022 - arxiv.org
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the …
We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional …
K Li, R Yang, X Hu - arXiv preprint arXiv:2209.15200, 2022 - arxiv.org
Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging …
Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech …
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …