Audiodec: An open-source streaming high-fidelity neural audio codec

YC Wu, ID Gebru, D Marković… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
A good audio codec for live applications such as telecommunication is characterized by
three key properties:(1) compression, ie the bitrate that is required to transmit the signal …

ViSQOL v3: An open source production ready objective speech and audio metric

M Chinen, FSC Lim, J Skoglund… - … on quality of …, 2020 - ieeexplore.ieee.org
Estimation of perceptual quality in audio and speech is possible using a variety of methods.
The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio …

Speech coding techniques and challenges: A comprehensive literature survey

M Anees - Multimedia Tools and Applications, 2024 - Springer
Speech coding is the process of compressing speech signals for transmission and storage
in communication systems. In recent years, speech coding has become increasingly …

Generative speech coding with predictive variance regularization

WB Kleijn, A Storus, M Chinen, T Denton… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The recent emergence of machine-learning based generative models for speech suggests a
significant reduction in bit rate for speech codecs is possible. However, the performance of …

Review of methods for coding of speech signals

D O'Shaughnessy - EURASIP Journal on Audio, Speech, and Music …, 2023 - Springer
Speech is the most common form of human communication, and many conversations use
digital communication links. For efficient transmission, acoustic speech waveforms are …

A real-time wideband neural vocoder at 1.6 kb/s using LPCNet

JM Valin, J Skoglund - arXiv preprint arXiv:1903.12087, 2019 - arxiv.org
Neural speech synthesis algorithms are a promising new approach for coding speech at
very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders …

End-to-end neural speech coding for real-time communications

X Jiang, X Peng, C Zheng, H Xue… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Deep-learning based methods have shown their advantages in audio coding over traditional
ones but limited attention has been paid on real-time communications (RTC). This paper …

Boomerang: Local sampling on image manifolds using diffusion models

L Luzi, A Siahkoohi, PM Mayer… - arXiv preprint arXiv …, 2022 - arxiv.org
Diffusion models can be viewed as mapping points in a high-dimensional latent space onto
a low-dimensional learned manifold, typically an image manifold. The intermediate values …

Improving Opus low bit rate quality with neural speech synthesis

J Skoglund, JM Valin - arXiv preprint arXiv:1905.04628, 2019 - arxiv.org
The voice mode of the Opus audio coder can compress wideband speech at bit rates
ranging from 6 kb/s to 40 kb/s. However, Opus is at its core a waveform matching coder, and …

APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding

Y Ai, XH Jiang, YX Lu, HP Du, ZH Ling - arXiv preprint arXiv:2402.10533, 2024 - arxiv.org
This paper introduces a novel neural audio codec targeting high waveform sampling rates
and low bitrates named APCodec, which seamlessly integrates the strengths of parametric …