Speech enhancement for low bit rate speech codec

J Lin, K Kalgaonkar, Q He, X Lei - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Speech codec compresses the input signal into compact bit stream, which is then decoded
at the receiver to generate the best possible perceptual quality. This compression makes …

Framewise WaveGAN: High speed adversarial vocoder in time domain with very low computational complexity

A Mustafa, JM Valin, J Büthe… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
GAN vocoders are currently one of the state-of-the-art methods for building high-quality
neural waveform generative models. However, most of their architectures require dozens of …

Low bit-rate speech coding with VQ-VAE and a WaveNet decoder

C Gârbacea, A van den Oord, Y Li… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
In order to efficiently transmit and store speech signals, speech codecs create a minimally
redundant representation of the input signal which is then decoded at the receiver with the …

[PDF][PDF] MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

T Kaneko, H Kameoka, K Tanaka, S Seki - Interspeech, 2022 - isca-archive.org
Neural vocoders have recently become popular in text-tospeech synthesis and voice
conversion, increasing the demand for efficient neural vocoders. One successful approach is …

[PDF][PDF] Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function.

J Byun, S Shin, J Sung, S Beack, Y Park - Interspeech, 2022 - researchgate.net
In this paper, we propose a method of perceptually optimizing the deep neural network
(DNN)-based speech coder using multi-time-scale perceptual loss functions. We utilize a …

Espresso: A fast end-to-end neural speech recognition toolkit

Y Wang, T Chen, H Xu, S Ding, H Lv… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
We present Espresso, an open-source, modular, extensible end-to-end neural automatic
speech recognition (ASR) toolkit based on the deep learning library PyTorch and the …

Multi-stream HiFi-GAN with data-driven waveform decomposition

T Okamoto, T Toda, H Kawai - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Although a HiFi-GAN vocoder can synthesize high-fidelity speech waveforms in real time on
CPUs, there is a tradeoff between synthesis quality and inference speed. To increase …

Basis-MelGAN: Efficient neural vocoder based on audio decomposition

Z Liu, Y Qian - arXiv preprint arXiv:2106.13419, 2021 - arxiv.org
Recent studies have shown that neural vocoders based on generative adversarial network
(GAN) can generate audios with high quality. While GAN based neural vocoders have …

WARP-Q: Quality prediction for generative neural speech codecs

WA Jassim, J Skoglund, M Chinen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Good speech quality has been achieved using waveform matching and parametric
reconstruction coders. Recently developed very low bit rate generative codecs can …

TFGAN: Time and frequency domain based generative adversarial network for high-fidelity speech synthesis

Q Tian, Y Chen, Z Zhang, H Lu, L Chen, L Xie… - arXiv preprint arXiv …, 2020 - arxiv.org
Recently, GAN based speech synthesis methods, such as MelGAN, have become very
popular. Compared to conventional autoregressive based methods, parallel structures …