相关文章- 学术资源搜索

Speech enhancement for low bit rate speech codec

J Lin, K Kalgaonkar, Q He, X Lei - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Speech codec compresses the input signal into compact bit stream, which is then decoded
at the receiver to generate the best possible perceptual quality. This compression makes …

被引用次数：9 相关文章

[PDF] arxiv.org

Framewise WaveGAN: High speed adversarial vocoder in time domain with very low computational complexity

A Mustafa, JM Valin, J Büthe… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

GAN vocoders are currently one of the state-of-the-art methods for building high-quality
neural waveform generative models. However, most of their architectures require dozens of …

被引用次数：3 相关文章所有 6 个版本

[PDF] academia.edu

Low bit-rate speech coding with VQ-VAE and a WaveNet decoder

C Gârbacea, A van den Oord, Y Li… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

In order to efficiently transmit and store speech signals, speech codecs create a minimally
redundant representation of the input signal which is then decoded at the receiver with the …

被引用次数：123 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

T Kaneko, H Kameoka, K Tanaka, S Seki - Interspeech, 2022 - isca-archive.org

Neural vocoders have recently become popular in text-tospeech synthesis and voice
conversion, increasing the demand for efficient neural vocoders. One successful approach is …

被引用次数：5 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function.

J Byun, S Shin, J Sung, S Beack, Y Park - Interspeech, 2022 - researchgate.net

In this paper, we propose a method of perceptually optimizing the deep neural network
(DNN)-based speech coder using multi-time-scale perceptual loss functions. We utilize a …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Espresso: A fast end-to-end neural speech recognition toolkit

Y Wang, T Chen, H Xu, S Ding, H Lv… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org

We present Espresso, an open-source, modular, extensible end-to-end neural automatic
speech recognition (ASR) toolkit based on the deep learning library PyTorch and the …

被引用次数：86 相关文章所有 7 个版本

[PDF] nict.go.jp

Multi-stream HiFi-GAN with data-driven waveform decomposition

T Okamoto, T Toda, H Kawai - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Although a HiFi-GAN vocoder can synthesize high-fidelity speech waveforms in real time on
CPUs, there is a tradeoff between synthesis quality and inference speed. To increase …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Basis-MelGAN: Efficient neural vocoder based on audio decomposition

Z Liu, Y Qian - arXiv preprint arXiv:2106.13419, 2021 - arxiv.org

Recent studies have shown that neural vocoders based on generative adversarial network
(GAN) can generate audios with high quality. While GAN based neural vocoders have …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

WARP-Q: Quality prediction for generative neural speech codecs

WA Jassim, J Skoglund, M Chinen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Good speech quality has been achieved using waveform matching and parametric
reconstruction coders. Recently developed very low bit rate generative codecs can …

被引用次数：24 相关文章所有 4 个版本

[PDF] arxiv.org

TFGAN: Time and frequency domain based generative adversarial network for high-fidelity speech synthesis

Q Tian, Y Chen, Z Zhang, H Lu, L Chen, L Xie… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently, GAN based speech synthesis methods, such as MelGAN, have become very
popular. Compared to conventional autoregressive based methods, parallel structures …

被引用次数：31 相关文章所有 2 个版本

高级搜索

QQ 群

Speech enhancement for low bit rate speech codec

Framewise WaveGAN: High speed adversarial vocoder in time domain with very low computational complexity

Low bit-rate speech coding with VQ-VAE and a WaveNet decoder

[PDF][PDF] MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

[PDF][PDF] Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function.

Espresso: A fast end-to-end neural speech recognition toolkit

Multi-stream HiFi-GAN with data-driven waveform decomposition

Basis-MelGAN: Efficient neural vocoder based on audio decomposition

WARP-Q: Quality prediction for generative neural speech codecs

TFGAN: Time and frequency domain based generative adversarial network for high-fidelity speech synthesis

引用