Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement

SW Fu, CF Liao, Y Tsao, SD Lin - … Conference on Machine …, 2019 - proceedings.mlr.press
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …

End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks

SW Fu, TW Wang, Y Tsao, X Lu… - IEEE/ACM Transactions …, 2018 - ieeexplore.ieee.org
Speech enhancement model is used to map a noisy speech to a clean speech. In the
training stage, an objective function is often adopted to optimize the model parameters …

On loss functions for supervised monaural time-domain speech enhancement

M Kolbæk, ZH Tan, SH Jensen… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Many deep learning-based speech enhancement algorithms are designed to minimize the
mean-square error (MSE) in some transform domain between a predicted and a target …

A deep learning loss function based on the perceptual evaluation of the speech quality

JM Martin-Donas, AM Gomez… - IEEE Signal …, 2018 - ieeexplore.ieee.org
This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a
loss function, for training deep learning methods. This metric, derived from the perceptual …

Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement

TA Hsieh, C Yu, SW Fu, X Lu, Y Tsao - arXiv preprint arXiv:2010.15174, 2020 - arxiv.org
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both
related to a smooth transition in speech segments that may carry linguistic information, eg …

Speech enhancement using end-to-end speech recognition objectives

AS Subramanian, X Wang, MK Baskar… - … IEEE Workshop on …, 2019 - ieeexplore.ieee.org
Speech enhancement systems, which denoise and dereverberate distorted signals, are
usually optimized based on signal reconstruction objectives including the maximum …

Learning with learned loss function: Speech enhancement with quality-net to improve perceptual evaluation of speech quality

SW Fu, CF Liao, Y Tsao - IEEE Signal Processing Letters, 2019 - ieeexplore.ieee.org
Utilizing a human-perception-related objective function to train a speech enhancement
model has become a popular topic recently. The main reason is that the conventional mean …