A survey on deep learning: Algorithms, techniques, and applications

S Pouyanfar, S Sadiq, Y Yan, H Tian, Y Tao… - ACM computing …, 2018 - dl.acm.org
The field of machine learning is witnessing its golden era as deep learning slowly becomes
the leader in this domain. Deep learning uses multiple layers to represent the abstractions of …

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

[PDF][PDF] A convolutional recurrent neural network for real-time speech enhancement.

K Tan, DL Wang - Interspeech, 2018 - researchgate.net
Many real-world applications of speech enhancement, such as hearing aids and cochlear
implants, desire real-time processing, with no or low latency. In this paper, we propose a …

Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks

SW Fu, TW Wang, Y Tsao, X Lu… - IEEE/ACM Transactions …, 2018 - ieeexplore.ieee.org
Speech enhancement model is used to map a noisy speech to a clean speech. In the
training stage, an objective function is often adopted to optimize the model parameters …

Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification

D Michelsanti, ZH Tan - arXiv preprint arXiv:1709.01703, 2017 - arxiv.org
Improving speech system performance in noisy environments remains a challenging task,
and speech enhancement (SE) is one of the effective techniques to solve the problem …

Long short-term memory for speaker generalization in supervised speech separation

J Chen, DL Wang - The Journal of the Acoustical Society of America, 2017 - pubs.aip.org
Speech separation can be formulated as learning to estimate a time-frequency mask from
acoustic features extracted from noisy speech. For supervised speech separation …

Audio-visual speech enhancement using multimodal deep convolutional neural networks

JC Hou, SS Wang, YH Lai, Y Tsao… - … on Emerging Topics …, 2018 - ieeexplore.ieee.org
Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques
focus only on addressing audio information. In this paper, inspired by multimodal learning …