Catch-a-waveform: Learning to generate audio from a single short example

V Kulikov, S Yadin, M Kleiner… - … conference on machine …, 2023 - proceedings.mlr.press

Denoising diffusion models (DDMs) have led to staggering performance leaps in image
generation, editing and restoration. However, existing DDMs use very large datasets for …

被引用次数：56 相关文章所有 6 个版本

[PDF] arxiv.org

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org

Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

被引用次数：74 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org

An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

被引用次数：50 相关文章所有 4 个版本

[PDF] arxiv.org

Solving audio inverse problems with a diffusion model

E Moliner, J Lehtinen, V Välimäki - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

This paper presents CQT-Diff, a data-driven generative audio model that can, once trained,
be used for solving various different audio inverse problems in a problem-agnostic setting …

被引用次数：36 相关文章所有 8 个版本

[PDF] arxiv.org

Deep internal learning: Deep learning from a single input

T Tirer, R Giryes, SY Chun, YC Eldar - arXiv preprint arXiv:2312.07425, 2023 - arxiv.org

Deep learning in general focuses on training a neural network from large labeled datasets.
Yet, in many cases there is value in training a network just from the input at hand. This may …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Learning to generate 3d shapes from a single example

R Wu, C Zheng - arXiv preprint arXiv:2208.02946, 2022 - arxiv.org

Existing generative models for 3D shapes are typically trained on a large 3D dataset, often
of a specific object category. In this paper, we investigate the deep generative model that …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation

DY Wu, WY Hsiao, FR Yang, O Friedman… - arXiv preprint arXiv …, 2022 - arxiv.org

A vocoder is a conditional audio generation model that converts acoustic features such as
mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

Diffusion-based audio inpainting

E Moliner, V Välimäki - arXiv preprint arXiv:2305.15266, 2023 - arxiv.org

Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of
existing methods produce plausible reconstructions when the gap lengths are short, but …

被引用次数：12 相关文章所有 7 个版本

[PDF] ieee.org

NoiseBandNet: controllable time-varying neural synthesis of sound effects using filterbanks

A Barahona-Ríos, T Collins - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

Controllable neural audio synthesis of sound effects is a challenging task due to the
potential scarcity and spectro-temporal variance of the data. Differentiable digital signal …

被引用次数：6 相关文章所有 4 个版本

[PDF] ieee.org

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

F Miotello, M Pezzoli, L Comanducci… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In this manuscript, we propose a novel method to perform audio inpainting, ie, the
restoration of audio signals presenting multiple missing parts. Audio inpainting can be …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群