A Large-Scale Evaluation of Speech Foundation Models

S Yang, HJ Chang, Z Huang, AT Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …

M-best-rq: A multi-channel speech foundation model for smart glasses

Y Yang, D Raj, J Lin, N Moritz, J Jia, G Keren… - arXiv preprint arXiv …, 2024 - arxiv.org
The growing popularity of multi-channel wearable devices, such as smart glasses, has led to
a surge of applications such as targeted speech recognition and enhanced hearing …

Spatialemb: Extract and Encode Spatial Information for 1-Stage Multi-Channel Multi-Speaker ASR on Arbitrary Microphone Arrays

Y Shao, Y Xu, S Khudanpur… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Spatial information is a critical clue for multi-channel multispeaker target speech recognition.
Most state-of-the-art multi-channel Automatic Speech Recognition (ASR) systems extract …

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

Y Shao, SX Zhang, Y Xu, M Yu, D Yu, D Povey… - arXiv preprint arXiv …, 2024 - arxiv.org
In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of
discerning and accurately transcribing a target speaker's speech within background noise …