VocGAN: A high-fidelity real-time vocoder with a hierarchically-nested adversarial network

J Yang, J Lee, Y Kim, H Cho, I Kim - arXiv preprint arXiv:2007.15256, 2020 - arxiv.org
We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently
developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time …

[PDF][PDF] Multi-mode neural speech coding based on deep generative networks

W Xiao, W Liu, M Wang, S Yang, Y Shi, Y Kang… - Proc …, 2023 - researchgate.net
The wideband or super wideband speech is one of the most prominent features in real-time
communication services, with higher resolution spectrum. However, it requires higher …

Stylemelgan: An efficient high-fidelity adversarial vocoder with temporal adaptive normalization

A Mustafa, N Pia, G Fuchs - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In recent years, neural vocoders have surpassed classical speech generation approaches in
naturalness and perceptual quality of the synthesized speech. Computationally heavy …

End-to-end neural speech coding for real-time communications

X Jiang, X Peng, C Zheng, H Xue… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Deep-learning based methods have shown their advantages in audio coding over traditional
ones but limited attention has been paid on real-time communications (RTC). This paper …

Architecture for variable bitrate neural speech codec with configurable computation complexity

T Jayashankar, T Koehler, K Kalgaonkar… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Low bitrate speech codecs have become an area of intense research. Traditional speech
codecs, which use signal processing methods to encode and decode speech, often suffer …

Neurally optimized decoder for low bitrate speech codec

HY Kim, JW Yoon, WI Cho… - IEEE Signal Processing …, 2021 - ieeexplore.ieee.org
Recently, a conventional neural decoder for speech codec has shown promising
performance. However, it typically requires some prior knowledge of decoding such as bit …

Generative De-Quantization for Neural Speech Codec Via Latent Diffusion

H Yang, I Jang, M Kim - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
End-to-end speech coding models achieve high coding gains by learning compact yet
expressive features and a powerful decoder in a single network. A challenging problem as …

Disentangled feature learning for real-time neural speech coding

X Jiang, X Peng, Y Zhang, Y Lu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Recently end-to-end neural audio/speech coding has shown its great potential to outperform
traditional signal analysis based audio codecs. This is mostly achieved by following the VQ …

Progressive multi-stage neural audio coding with guided references

C Lee, H Lim, J Lee, I Jang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this paper, we propose an effective multi-stage neural audio coding algorithm that
encodes full-band audio signals (up to 20 kHz) using an end-to-end training criterion. By …

Composition of deep and spiking neural networks for very low bit rate speech coding

M Cernak, A Lazaridis, A Asaei… - IEEE/ACM Transactions …, 2016 - ieeexplore.ieee.org
Most current very low bit rate (VLBR) speech coding systems use hidden Markov model
(HMM) based speech recognition and synthesis techniques. This allows transmission of …