Mesh2ir: Neural acoustic impulse response generator for complex 3d scenes

A Ratnarajah, Z Tang, R Aralikatti… - Proceedings of the 30th …, 2022 - dl.acm.org
We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse
responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create …

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z Jin, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Accurate recognition of cocktail party speech containing overlapping speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

End-to-end integration of speech recognition, dereverberation, beamforming, and self-supervised learning representation

Y Masuyama, X Chang, S Cornell… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Self-supervised learning representation (SSLR) has demonstrated its significant
effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work …

[PDF][PDF] Directional speech recognition for speaker disambiguation and cross-talk suppression

J Lin, N Moritz, R Xie, K Kalgaonkar… - Proc. INTERSPEECH …, 2023 - isca-archive.org
With advances in mobile computing, smart glasses are becoming powerful enough to
generate real-time closed captions of live conversations. Such system must distinguish …

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

C Cui, I Sheikh, M Sadeghi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present an end-to-end multichannel speaker-attributed automatic speech recognition
(MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame cross …

[PDF][PDF] The FOSAFER system for the CHiME-8 MMCSG challenge

S Huang, D Zhang, Y Wang, J Deng… - CHiME Workshop on …, 2024 - isca-archive.org
This paper presents the system designed by FOSAFER for the CHiME-8 MMCSG challenge.
Our system generates text transcriptions with speaker attributes from natural conversations …

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

Y Shao, SX Zhang, D Yu - arXiv preprint arXiv:2311.00146, 2023 - arxiv.org
Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges
within the speech community, particularly when confronted with significant reverberation …

Real Time Detection and Tracking in Multi Speakers Video Conferencing

N Affes, J Ktari, N Ben Amor, T Frikha… - … Conference on Intelligent …, 2022 - Springer
Currently, the videoconferencing market is growing worldwide with annual growth (CAGR)
up to 10%. Several companies appreciated this technique during the Coronavirus lockdown …

SepLocNet: Multi-speaker localization with separation-guided TDOA estimation in wireless acoustic sensor networks

X Dang, A Herzog, SR Chetupalli, EAP Habets, H Liu - Applied Acoustics, 2025 - Elsevier
Time difference of arrival (TDOA)-based multi-speaker localization allows three-dimensional
localization using a low-cost wireless acoustic sensor network (WASN). However, two …

Spatialemb: Extract and Encode Spatial Information for 1-Stage Multi-Channel Multi-Speaker ASR on Arbitrary Microphone Arrays

Y Shao, Y Xu, S Khudanpur… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Spatial information is a critical clue for multi-channel multispeaker target speech recognition.
Most state-of-the-art multi-channel Automatic Speech Recognition (ASR) systems extract …