A differentiable perceptual audio metric learned from just noticeable differences

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

被引用次数：229 相关文章所有 14 个版本

[PDF] arxiv.org

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org

The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

被引用次数：238 相关文章所有 9 个版本

[PDF] arxiv.org

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …

被引用次数：299 相关文章所有 4 个版本

[PDF] arxiv.org

DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …

被引用次数：206 相关文章所有 3 个版本

[PDF] arxiv.org

Dreamsim: Learning new dimensions of human visual similarity using synthetic data

S Fu, N Tamir, S Sundaram, L Chai, R Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Current perceptual similarity metrics operate at the level of pixels and patches. These
metrics compare images in terms of their low-level colors and textures, but fail to capture mid …

被引用次数：121 相关文章所有 5 个版本

[PDF] arxiv.org

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org

Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

被引用次数：104 相关文章所有 6 个版本

[PDF] arxiv.org

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

被引用次数：82 相关文章所有 5 个版本

[PDF] arxiv.org

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

LM Heller, B Elizalde, B Raj, S Deshmukh - arXiv preprint arXiv …, 2023 - arxiv.org

Machine Listening, as usually formalized, attempts to perform a task that is, from our
perspective, fundamentally human-performable, and performed by humans. Current …

被引用次数：12 相关文章所有 3 个版本

[PDF] princeton.edu

HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features

J Su, Z Jin, A Finkelstein - … of Signal Processing to Audio and …, 2021 - ieeexplore.ieee.org

Modern speech content creation tasks such as podcasts, video voice-overs, and audio
books require studio-quality audio with full bandwidth and balanced equalization (EQ) …

被引用次数：59 相关文章所有 5 个版本

[PDF] thecvf.com

Be everywhere-hear everything (bee): Audio scene reconstruction by sparse audio-visual samples

M Chen, K Su, E Shlizerman - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Fully immersive and interactive audio-visual scenes are dynamic such that the listeners and
the sound emitters move and interact with each other. Reconstruction of an immersive sound …

被引用次数：8 相关文章所有 4 个版本

高级搜索

QQ 群