Soundstream: An end-to-end neural audio codec

T Wu, YJ Yuan, LX Zhang, J Yang, YP Cao… - Computational Visual …, 2024 - Springer

The emergence of 3D Gaussian splatting (3DGS) has greatly accelerated rendering in novel
view synthesis. Unlike neural implicit representations like neural radiance fields (NeRFs) …

被引用次数：51 相关文章所有 3 个版本

[PDF] researchgate.net

Edge computing on IoT for machine signal processing and fault diagnosis: A review

S Lu, J Lu, K An, X Wang, Q He - IEEE Internet of Things …, 2023 - ieeexplore.ieee.org

Edge computing is an emerging paradigm that offloads the computations and analytics
workloads onto the Internet of Things (IoT) edge devices to accelerate the computation …

被引用次数：132 相关文章所有 2 个版本

[PDF] neurips.cc

Simple and controllable music generation

J Copet, F Kreuk, I Gat, T Remez… - Advances in …, 2024 - proceedings.neurips.cc

We tackle the task of conditional music generation. We introduce MusicGen, a single
Language Model (LM) that operates over several streams of compressed discrete music …

被引用次数：403 相关文章所有 9 个版本

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

被引用次数：581 相关文章所有 3 个版本

[PDF] mlr.press

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press

Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

被引用次数：289 相关文章所有 7 个版本

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arXiv preprint arXiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

被引用次数：619 相关文章所有 3 个版本

[PDF] arxiv.org

Audiolm: a language modeling approach to audio generation

Z Borsos, R Marinier, D Vincent… - … ACM transactions on …, 2023 - ieeexplore.ieee.org

We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …

被引用次数：541 相关文章所有 5 个版本

[PDF] neurips.cc

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc

Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

被引用次数：220 相关文章所有 8 个版本

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

被引用次数：205 相关文章所有 5 个版本

[PDF] mit.edu

Speak, read and prompt: High-fidelity text-to-speech with minimal supervision

E Kharitonov, D Vincent, Z Borsos… - Transactions of the …, 2023 - direct.mit.edu

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained
with minimal supervision. By combining two types of discrete speech representations, we …

被引用次数：172 相关文章所有 5 个版本

高级搜索

QQ 群