Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding

YJ Lu, X Chang, C Li, W Zhang, S Cornell, Z Ni… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents recent progress on integrating speech separation and enhancement
(SSE) into the ESPnet toolkit. Compared with the previous ESPnet-SE work, numerous …

PCA dimensionality reduction method for image classification

B Zhao, X Dong, Y Guo, X Jia, Y Huang - Neural Processing Letters, 2022 - Springer
The pooling layer has achieved good results in reducing the feature dimension and
parameters of convolution neural network (CNN), but it will cause different degrees of …

[HTML][HTML] Identification of fake stereo audio using SVM and CNN

T Liu, D Yan, R Wang, N Yan, G Chen - Information, 2021 - mdpi.com
The number of channels is one of the important criteria in regard to digital audio quality.
Generally, stereo audio with two channels can provide better perceptual quality than mono …

Neural spatio-temporal beamformer for target speech separation

Y Xu, M Yu, SX Zhang, L Chen, C Weng, J Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Purely neural network (NN) based speech separation and enhancement methods, although
can achieve good objective scores, inevitably cause nonlinear speech distortions that are …

MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

F Yu, S Zhang, P Guo, Y Liang, Z Du… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Recently cross-channel attention, which better leverages multi-channel signals from
microphone array, has shown promising results in the multi-party meeting scenario. Cross …

[PDF][PDF] Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition.

Y Bando, T Aizawa, K Itoyama, K Nakadai - Interspeech, 2022 - isca-archive.org
This paper presents a weakly-supervised multichannel neural speech separation method for
distant speech recognition (DSR) of real conversational speech mixtures. A blind source …

Multi-channel multi-speaker ASR using 3D spatial feature

Y Shao, SX Zhang, D Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech
remains one of the most challenging tasks to the speech community. In this paper, we look …