Sinddm: A single image denoising diffusion model

V Kulikov, S Yadin, M Kleiner… - … conference on machine …, 2023 - proceedings.mlr.press
Denoising diffusion models (DDMs) have led to staggering performance leaps in image
generation, editing and restoration. However, existing DDMs use very large datasets for …

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org
Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

Solving audio inverse problems with a diffusion model

E Moliner, J Lehtinen, V Välimäki - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained,
be used for solving various different audio inverse problems in a problem-agnostic setting …

Deep internal learning: Deep learning from a single input

T Tirer, R Giryes, SY Chun, YC Eldar - arXiv preprint arXiv:2312.07425, 2023 - arxiv.org
Deep learning in general focuses on training a neural network from large labeled datasets.
Yet, in many cases there is value in training a network just from the input at hand. This may …

Learning to generate 3d shapes from a single example

R Wu, C Zheng - arXiv preprint arXiv:2208.02946, 2022 - arxiv.org
Existing generative models for 3D shapes are typically trained on a large 3D dataset, often
of a specific object category. In this paper, we investigate the deep generative model that …

DDSP-based singing vocoders: A new subtractive-based synthesizer and a comprehensive evaluation

DY Wu, WY Hsiao, FR Yang, O Friedman… - arXiv preprint arXiv …, 2022 - arxiv.org
A vocoder is a conditional audio generation model that converts acoustic features such as
mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal …

Diffusion-based audio inpainting

E Moliner, V Välimäki - arXiv preprint arXiv:2305.15266, 2023 - arxiv.org
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of
existing methods produce plausible reconstructions when the gap lengths are short, but …

NoiseBandNet: controllable time-varying neural synthesis of sound effects using filterbanks

A Barahona-Ríos, T Collins - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Controllable neural audio synthesis of sound effects is a challenging task due to the
potential scarcity and spectro-temporal variance of the data. Differentiable digital signal …

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

F Miotello, M Pezzoli, L Comanducci… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this manuscript, we propose a novel method to perform audio inpainting, ie, the
restoration of audio signals presenting multiple missing parts. Audio inpainting can be …