A deep generative model of speech complex spectrograms

AA Nugraha, K Sekiguchi… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This paper proposes an approach to the joint modeling of the short-time Fourier transform
magnitude and phase spectrograms with a deep generative model. We assume that the …

Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

S Takamichi, Y Saito, N Takamune… - … on Acoustic Signal …, 2018 - ieeexplore.ieee.org
This paper presents a deep neural network (DNN)-based phase reconstruction from
amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram …

STFT spectral loss for training a neural speech waveform model

S Takaki, T Nakashika, X Wang… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the
aim of training a high-performance neural speech waveform model that predicts raw …

Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram

K Oyamada, H Kameoka, T Kaneko… - 2018 26th European …, 2018 - ieeexplore.ieee.org
In this paper, we address the problem of reconstructing a time-domain signal (or a phase
spectrogram) solely from a magnitude spectrogram. Since magnitude spectrograms do not …

A fully convolutional neural network for complex spectrogram processing in speech enhancement

Z Ouyang, H Yu, WP Zhu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
In this paper we propose a fully convolutional neural network (CNN) for complex
spectrogram processing in speech enhancement. The proposed CNN consists of one …

Generative adversarial network-based postfilter for STFT spectrograms

T Kaneko, S Takaki, H Kameoka… - Interspeech 2017, 2017 - research.ed.ac.uk
We propose a learning-based postfilter to reconstruct the high-fidelity spectral texture in
short-term Fourier transform (STFT) spectrograms. In speech-processing systems, such as …

Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing

Z Tüske, R Schlüter, H Ney - 2018 IEEE international …, 2018 - ieeexplore.ieee.org
Recently, several papers have demonstrated that neural networks (NN) are able to perform
the feature extraction as part of the acoustic model. Motivated by the Gammatone feature …

[PDF][PDF] Binary coding of speech spectrograms using a deep auto-encoder

L Deng, ML Seltzer, D Yu, A Acero… - … annual conference of …, 2010 - dub.ucsd.edu
This paper reports our recent exploration of the layer-by-layer learning strategy for training a
multi-layer generative model of patches of speech spectrograms. The top layer of the …

Multi-stream acoustic modelling using raw real and imaginary parts of the Fourier transform

E Loweimi, Z Yue, P Bell, S Renals… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
In this paper, we investigate multi-stream acoustic modelling using the raw real and
imaginary parts of the Fourier transform of speech signals. Using the raw magnitude …

Speech acoustic modelling from raw phase spectrum

E Loweimi, Z Cvetkovic, P Bell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Magnitude spectrum-based features are the most widely employed front-ends for acoustic
modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the …