Far-field location guided target speech extraction using end-to-end speech recognition objectives

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier

Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

被引用次数：70 相关文章所有 5 个版本

[PDF] arxiv.org

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

被引用次数：79 相关文章所有 5 个版本

[PDF] arxiv.org

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding

YJ Lu, X Chang, C Li, W Zhang, S Cornell, Z Ni… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents recent progress on integrating speech separation and enhancement
(SSE) into the ESPnet toolkit. Compared with the previous ESPnet-SE work, numerous …

被引用次数：22 相关文章所有 8 个版本

PCA dimensionality reduction method for image classification

B Zhao, X Dong, Y Guo, X Jia, Y Huang - Neural Processing Letters, 2022 - Springer

The pooling layer has achieved good results in reducing the feature dimension and
parameters of convolution neural network (CNN), but it will cause different degrees of …

被引用次数：24 相关文章所有 3 个版本

[HTML] mdpi.com

[HTML][HTML] Identification of fake stereo audio using SVM and CNN

T Liu, D Yan, R Wang, N Yan, G Chen - Information, 2021 - mdpi.com

The number of channels is one of the important criteria in regard to digital audio quality.
Generally, stereo audio with two channels can provide better perceptual quality than mono …

被引用次数：26 相关文章所有 4 个版本

[PDF] arxiv.org

Neural spatio-temporal beamformer for target speech separation

Y Xu, M Yu, SX Zhang, L Chen, C Weng, J Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Purely neural network (NN) based speech separation and enhancement methods, although
can achieve good objective scores, inevitably cause nonlinear speech distortions that are …

被引用次数：42 相关文章所有 8 个版本

[PDF] arxiv.org

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

F Yu, S Zhang, P Guo, Y Liang, Z Du… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Recently cross-channel attention, which better leverages multi-channel signals from
microphone array, has shown promising results in the multi-party meeting scenario. Cross …

被引用次数：11 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition.

Y Bando, T Aizawa, K Itoyama, K Nakadai - Interspeech, 2022 - isca-archive.org

This paper presents a weakly-supervised multichannel neural speech separation method for
distant speech recognition (DSR) of real conversational speech mixtures. A blind source …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-channel multi-speaker ASR using 3D spatial feature

Y Shao, SX Zhang, D Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech
remains one of the most challenging tasks to the speech community. In this paper, we look …

被引用次数：12 相关文章所有 3 个版本

高级搜索

QQ 群