Multitalker speech separation with utterance-level permutation invariant training of deep...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：224 相关文章所有 6 个版本

[PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

被引用次数：417 相关文章所有 7 个版本

[PDF] arxiv.org

Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

被引用次数：668 相关文章所有 7 个版本

[PDF] arxiv.org

Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation

Y Luo, Z Chen, T Yoshioka - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Recent studies in deep learning-based speech separation have proven the superiority of
time-domain approaches to conventional time-frequency-based methods. Unlike the time …

被引用次数：889 相关文章所有 6 个版本

[PDF] arxiv.org

Recent developments on espnet toolkit boosted by conformer

P Guo, F Boyer, X Chang, T Hayashi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …

被引用次数：304 相关文章所有 8 个版本

[PDF] arxiv.org

Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation

J Chen, Q Mao, D Liu - arXiv preprint arXiv:2007.13975, 2020 - arxiv.org

The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …

被引用次数：339 相关文章所有 8 个版本

[PDF] arxiv.org

SDR–half-baked or well done?

J Le Roux, S Wisdom, H Erdogan… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous
objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was …

被引用次数：1375 相关文章所有 11 个版本

[PDF] ieee.org

Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org

Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

被引用次数：2231 相关文章所有 13 个版本

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：303 相关文章所有 6 个版本

[PDF] ieee.org

Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org

Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

被引用次数：1646 相关文章所有 14 个版本

高级搜索

QQ 群