A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant...

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer

Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

被引用次数：71 相关文章所有 5 个版本

[PDF] arxiv.org

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org

Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

被引用次数：391 相关文章所有 10 个版本

[PDF] arxiv.org

Attentive statistics pooling for deep speaker embedding

K Okabe, T Koshinaka, K Shinoda - arXiv preprint arXiv:1803.10963, 2018 - arxiv.org

This paper proposes attentive statistics pooling for deep speaker embedding in text-
independent speaker verification. In conventional speaker embedding, frame-level features …

被引用次数：583 相关文章所有 10 个版本

[PDF] arxiv.org

Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement

X Hao, X Su, R Horaud, X Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …

被引用次数：178 相关文章所有 26 个版本

[PDF] neurips.cc

Soundspaces 2.0: A simulation platform for visual-acoustic learning

C Chen, C Schissler, S Garg… - Advances in …, 2022 - proceedings.neurips.cc

Abstract We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio
rendering for 3D environments. Given a 3D mesh of a real-world environment …

被引用次数：58 相关文章所有 8 个版本

[PDF] tuni.fi

Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge

A Mesaros, T Heittola, E Benetos… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org

Public evaluation campaigns and datasets promote active development in target research
areas, allowing direct comparison of algorithms. The second edition of the challenge on …

被引用次数：364 相关文章所有 9 个版本

[PDF] arxiv.org

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arXiv preprint arXiv:2203.15149, 2022 - arxiv.org

Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

被引用次数：47 相关文章所有 7 个版本

[PDF] arxiv.org

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org

Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

被引用次数：155 相关文章所有 10 个版本

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

被引用次数：100 相关文章所有 8 个版本

高级搜索

QQ 群