Icassp 2023 deep noise suppression challenge

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …

DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …

Dreamsim: Learning new dimensions of human visual similarity using synthetic data

S Fu, N Tamir, S Sundaram, L Chai, R Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Current perceptual similarity metrics operate at the level of pixels and patches. These
metrics compare images in terms of their low-level colors and textures, but fail to capture mid …

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org
Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

LM Heller, B Elizalde, B Raj, S Deshmukh - arXiv preprint arXiv …, 2023 - arxiv.org
Machine Listening, as usually formalized, attempts to perform a task that is, from our
perspective, fundamentally human-performable, and performed by humans. Current …

HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features

J Su, Z Jin, A Finkelstein - … of Signal Processing to Audio and …, 2021 - ieeexplore.ieee.org
Modern speech content creation tasks such as podcasts, video voice-overs, and audio
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …

Be everywhere-hear everything (bee): Audio scene reconstruction by sparse audio-visual samples

M Chen, K Su, E Shlizerman - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Fully immersive and interactive audio-visual scenes are dynamic such that the listeners and
the sound emitters move and interact with each other. Reconstruction of an immersive sound …