High fidelity speech synthesis with adversarial networks

M Bińkowski, J Donahue, S Dieleman, A Clark… - arXiv preprint arXiv …, 2019 - arxiv.org
Generative adversarial networks have seen rapid development in recent years and have led
to remarkable improvements in generative modelling of images. However, their application …

[图书][B] Fundamentals of music processing: Audio, analysis, algorithms, applications

M Müller - 2015 - Springer
This textbook provides both profound technological knowledge and a comprehensive
treatment of essential topics in music processing and music information retrieval. Including …

[PDF][PDF] Signal reconstruction from STFT magnitude: A state of the art

N Sturmel, L Daudet - International conference on digital audio effects …, 2011 - dafx.de
This paper presents a review on techniques for signal reconstruction without phase, ie when
only the spectrogram (the squared magnitude of the Short Time Fourier Transform) of the …

Learning spectral mapping for speech dereverberation and denoising

K Han, Y Wang, DL Wang, WS Woods… - … on Audio, Speech …, 2015 - ieeexplore.ieee.org
In real-world environments, human speech is usually distorted by both reverberation and
background noise, which have negative effects on speech intelligibility and speech quality …

{WaveGuard}: Understanding and mitigating audio adversarial examples

S Hussain, P Neekhara, S Dubnov, J McAuley… - 30th USENIX security …, 2021 - usenix.org
There has been a recent surge in adversarial attacks on deep learning based automatic
speech recognition (ASR) systems. These attacks pose new challenges to deep learning …

A fast Griffin-Lim algorithm

N Perraudin, P Balazs… - 2013 IEEE workshop on …, 2013 - ieeexplore.ieee.org
In this paper, we present a new algorithm to estimate a signal from its short-time Fourier
transform modulus (STFTM). This algorithm is computationally simple and is obtained by an …

Differentiable consistency constraints for improved deep speech enhancement

S Wisdom, JR Hershey, K Wilson… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
In recent years, deep networks have led to dramatic improvements in speech enhancement
by framing it as a data-driven pattern recognition problem. In many modern enhancement …

A noniterative method for reconstruction of phase from STFT magnitude

Z Průša, P Balazs… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
A noniterative method for the reconstruction of the short-time fourier transform (STFT) phase
from the magnitude is presented. The method is based on the direct relationship between …

Speedyspeech: Efficient neural speech synthesis

J Vainer, O Dušek - arXiv preprint arXiv:2008.03802, 2020 - arxiv.org
While recent neural sequence-to-sequence models have greatly improved the quality of
speech synthesis, there has not been a system capable of fast training, fast inference and …

On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks

N Saleem, MI Khattak, M Al-Hasan, AB Qazi - IEEE Access, 2020 - ieeexplore.ieee.org
Human speech in real-world environments is typically degraded by the background noise.
They have a negative impact on perceptual speech quality and intelligibility which causes …