[HTML][HTML] An overview of variational autoencoders for source separation, finance, and bio-signal applications

A Singh, T Ogunfunmi - Entropy, 2021 - mdpi.com
Autoencoders are a self-supervised learning system where, during training, the output is an
approximation of the input. Typically, autoencoders have three parts: Encoder (which …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

Speech enhancement with score-based generative models in the complex STFT domain

S Welker, J Richter, T Gerkmann - arXiv preprint arXiv:2203.17004, 2022 - arxiv.org
Score-based generative models (SGMs) have recently shown impressive results for difficult
generative tasks such as the unconditional and conditional generation of natural images …

Unsupervised speech enhancement using dynamical variational autoencoders

X Bie, S Leglaive, X Alameda-Pineda… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with
latent variables, dedicated to model time series of high-dimensional data. DVAEs can be …

RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion

G Chi, Z Yang, C Wu, J Xu, Y Gao, Y Liu… - Proceedings of the 30th …, 2024 - dl.acm.org
Along with AIGC shines in CV and NLP, its potential in the wireless domain has also
emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for …

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

S Chowdhury, S Nag, KJ Joseph… - Proceedings of the …, 2024 - openaccess.thecvf.com
Music is a universal language that can communicate emotions and feelings. It forms an
essential part of the whole spectrum of creative media ranging from movies to social media …

[HTML][HTML] Deep learning-based stereophonic acoustic echo suppression without decorrelation

L Cheng, R Peng, A Li, C Zheng, X Li - The Journal of the Acoustical …, 2021 - pubs.aip.org
Traditional stereophonic acoustic echo cancellation algorithms need to estimate acoustic
echo paths from stereo loudspeakers to a microphone, which often suffers from the …

A multi-dimensional deep structured state space approach to speech enhancement using small-footprint models

PJ Ku, CHH Yang, SM Siniscalchi, CH Lee - arXiv preprint arXiv …, 2023 - arxiv.org
We propose a multi-dimensional structured state space (S4) approach to speech
enhancement. To better capture the spectral dependencies across the frequency axis, we …

Disentanglement learning for variational autoencoders applied to audio-visual speech enhancement

G Carbajal, J Richter… - 2021 IEEE Workshop on …, 2021 - ieeexplore.ieee.org
Recently, the standard variational autoencoder has been successfully used to learn a
probabilistic prior over speech signals, which is then used to perform speech enhancement …

Variance-preserving-Based interpolation diffusion models for speech enhancement

Z Guo, J Du, CH Lee, Y Gao, W Zhang - arXiv preprint arXiv:2306.08527, 2023 - arxiv.org
The goal of this study is to implement diffusion models for speech enhancement (SE). The
first step is to emphasize the theoretical foundation of variance-preserving (VP)-based …