Mixture-of-expert conformer for streaming multilingual asr

K Hu, B Li, TN Sainath, Y Zhang, F Beaufays - arXiv preprint arXiv …, 2023 - arxiv.org
End-to-end models with large capacity have significantly improved multilingual automatic
speech recognition, but their computation cost poses challenges for on-device applications …

Improving multilingual and code-switching asr using large language model generated text

K Hu, TN Sainath, B Li, Y Zhang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We investigate using large language models (LLMs) to generate text-only training data for
improving multilingual and code-switching automatic speech recognition (ASR) through a …

Scaling up deliberation for multilingual ASR

K Hu, B Li, TN Sainath - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
Multilingual end-to-end automatic speech recognition models are attractive due to its
simplicity in training and deployment. Recent work on large-scale training of such models …

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Y Yang, H Xu, H Huang, ES Chng… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more
unpaired text data by multi-modal training, one needs to address two problems: 1) the …

SSHR: Leveraging self-supervised hierarchical representations for multilingual automatic speech recognition

H Xue, Q Shao, K Huang, P Chen… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Multilingual automatic speech recognition (ASR) systems have garnered attention for their
potential to extend language coverage globally. While self-supervised learning (SSL) …

A truly multilingual first pass and monolingual second pass streaming on-device ASR system

S Mavandadi, B Li, C Zhang, B Farris… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Automatic speech recognition (ASR) systems need to be accurate, have low latency, and
effectively handle language switching in order to be useful for the 60% of the world …

Confidence-based ensembles of end-to-end speech recognition models

I Gitman, V Lavrukhin, A Laptev, B Ginsburg - arXiv preprint arXiv …, 2023 - arxiv.org
The number of end-to-end speech recognition models grows every year. These models are
often adapted to new domains or languages resulting in a proliferation of expert systems that …

Internal language model estimation based adaptive language model fusion for domain adaptation

R Ma, X Wu, J Qiu, Y Qin, H Xu, P Wu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
ASR model deployment environment is ever-changing, and the incoming speech can be
switched across different domains during a session. This brings a challenge for effective …

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

J Bai, B Li, Q Li, TN Sainath… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The end-to-end ASR model is often desired in the streaming multilingual scenario since it is
easier to deploy and can benefit from pre-trained speech models such as powerful …

Random utterance concatenation based data augmentation for improving short-video speech recognition

YY Lin, T Han, H Xu, VT Pham, Y Khassanov… - arXiv preprint arXiv …, 2022 - arxiv.org
One of limitations in end-to-end automatic speech recognition (ASR) framework is its
performance would be compromised if train-test utterance lengths are mismatched. In this …