Our previous work, the unified source-filter GAN (uSFGAN) vocoder, introduced a novel architecture based on the source-filter theory into the parallel waveform generative …
We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with …
Z Zhao, H Liu, T Fingscheidt - IEEE/ACM Transactions on …, 2018 - ieeexplore.ieee.org
Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors is a challenging task. In this paper, we propose …
Y Ai, XH Jiang, YX Lu, HP Du, ZH Ling - arXiv preprint arXiv:2402.10533, 2024 - arxiv.org
This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric …
L Lu, S Renals - IEEE/ACM Transactions on Audio, Speech …, 2017 - ieeexplore.ieee.org
State-of-the-art speech recognition systems typically employ neural network acoustic models. However, compared to Gaussian mixture models, deep neural network (DNN) …
Y Qian, PC Woodland - 2016 IEEE spoken language …, 2016 - ieeexplore.ieee.org
This paper describes the extension and optimisation of our previous work on very deep convolutional neural networks (CNNs) for effective recognition of noisy speech in the Aurora …
X Meng, C Liu, Z Zhang, D Wang - 2014 IEEE China Summit & …, 2014 - ieeexplore.ieee.org
Deep neural networks (DNN) have gained remarkable success in speech recognition, partially attributed to its flexibility in learning complex patterns of speech signals. This …
Y Miao, M Gowayyed, F Metze - 2015 IEEE workshop on …, 2015 - ieeexplore.ieee.org
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR …
Z Tian, J Yi, J Tao, Y Bai, Z Wen - arXiv preprint arXiv:1909.13037, 2019 - arxiv.org
Recurrent neural network transducers (RNN-T) have been successfully applied in end-to- end speech recognition. However, the recurrent structure makes it difficult for parallelization …