A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Icassp 2023 deep noise suppression challenge

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets

G Mittag, B Naderi, A Chehadi, S Möller - arXiv preprint arXiv:2104.09494, 2021 - arxiv.org
In this paper, we present an update to the NISQA speech quality prediction model that is
focused on distortions that occur in communication networks. In contrast to the previous …

DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …

Modulation spectral signal representation for quality measurement and enhancement of wearable device data: A technical note

A Tiwari, R Cassani, S Kshirsagar, DP Tobon, Y Zhu… - Sensors, 2022 - mdpi.com
Wearable devices are burgeoning, and applications across numerous verticals are
emerging, including human performance monitoring, at-home patient monitoring, and health …

DNN no-reference PSTN speech quality prediction

G Mittag, R Cutler, Y Hosseinkashi, M Revow… - arXiv preprint arXiv …, 2020 - arxiv.org
Classic public switched telephone networks (PSTN) are often a black box for VoIP network
providers, as they have no access to performance indicators, such as delay or packet loss …

A spatial–temporal graph model for pronunciation feature prediction of Chinese poetry

Q Wang, W Liu, X Wang, X Chen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
With the development of artificial intelligence, speech recognition and prediction have
become one of the important research domains with wild applications, such as intelligent …

Non-intrusive speech intelligibility prediction for hearing-impaired users using intermediate ASR features and human memory models

R Mogridge, G Close, R Sutherland… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Neural networks have been successfully used for non-intrusive speech intelligibility
prediction. Recently, the use of feature representations sourced from intermediate layers of …

Non-Intrusive Air Traffic Control Speech Quality Assessment with ResNet-BiLSTM

Y Wu, G Li, Q Fu - Applied Sciences, 2023 - mdpi.com
In the current field of air traffic control speech, there is a lack of effective objective speech
quality evaluation methods. This paper proposes a new network framework based on …

The effect of spoken language on speech enhancement using self-supervised speech representation loss functions

G Close, T Hain, S Goetze - … of Signal Processing to Audio and …, 2023 - ieeexplore.ieee.org
Recent work in the field of speech enhancement (SE) has involved the use of self-
supervised speech representations (SSSRs) as feature transformations in loss functions …