A language agnostic multilingual streaming on-device asr system

K Hu, B Li, TN Sainath, Y Zhang, F Beaufays - arXiv preprint arXiv …, 2023 - arxiv.org

End-to-end models with large capacity have significantly improved multilingual automatic
speech recognition, but their computation cost poses challenges for on-device applications …

被引用次数：13 相关文章所有 4 个版本

Improving multilingual and code-switching asr using large language model generated text

K Hu, TN Sainath, B Li, Y Zhang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We investigate using large language models (LLMs) to generate text-only training data for
improving multilingual and code-switching automatic speech recognition (ASR) through a …

被引用次数：6 相关文章

[PDF] arxiv.org

Scaling up deliberation for multilingual ASR

K Hu, B Li, TN Sainath - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org

Multilingual end-to-end automatic speech recognition models are attractive due to its
simplicity in training and deployment. Recent work on large-scale training of such models …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Y Yang, H Xu, H Huang, ES Chng… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more
unpaired text data by multi-modal training, one needs to address two problems: 1) the …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

SSHR: Leveraging self-supervised hierarchical representations for multilingual automatic speech recognition

H Xue, Q Shao, K Huang, P Chen… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Multilingual automatic speech recognition (ASR) systems have garnered attention for their
potential to extend language coverage globally. While self-supervised learning (SSL) …

被引用次数：2 相关文章所有 2 个版本

A truly multilingual first pass and monolingual second pass streaming on-device ASR system

S Mavandadi, B Li, C Zhang, B Farris… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Automatic speech recognition (ASR) systems need to be accurate, have low latency, and
effectively handle language switching in order to be useful for the 60% of the world …

被引用次数：4 相关文章

[PDF] arxiv.org

Confidence-based ensembles of end-to-end speech recognition models

I Gitman, V Lavrukhin, A Laptev, B Ginsburg - arXiv preprint arXiv …, 2023 - arxiv.org

The number of end-to-end speech recognition models grows every year. These models are
often adapted to new domains or languages resulting in a proliferation of expert systems that …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Internal language model estimation based adaptive language model fusion for domain adaptation

R Ma, X Wu, J Qiu, Y Qin, H Xu, P Wu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

ASR model deployment environment is ever-changing, and the incoming speech can be
switched across different domains during a session. This brings a challenge for effective …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

J Bai, B Li, Q Li, TN Sainath… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

The end-to-end ASR model is often desired in the streaming multilingual scenario since it is
easier to deploy and can benefit from pre-trained speech models such as powerful …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Random utterance concatenation based data augmentation for improving short-video speech recognition

YY Lin, T Han, H Xu, VT Pham, Y Khassanov… - arXiv preprint arXiv …, 2022 - arxiv.org

One of limitations in end-to-end automatic speech recognition (ASR) framework is its
performance would be compromised if train-test utterance lengths are mismatched. In this …

被引用次数：2 相关文章所有 5 个版本

高级搜索

QQ 群