New solutions on LLM acceleration, optimization, and application

Y Huang, LJ Wan, H Ye, M Jha, J Wang, Y Li… - Proceedings of the 61st …, 2024 - dl.acm.org
Large Language Models (LLMs) have revolutionized a wide range of applications with their
strong human-like understanding and creativity. Due to the continuously growing model size …

Acoustic model fusion for end-to-end speech recognition

Z Lei, M Xu, S Han, L Liu, Z Huang, T Ng… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recent advances in deep learning and automatic speech recognition (ASR) have enabled
the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E …

Retrieval augmented correction of named entity speech recognition errors

E Pusateri, A Walia, A Kashi, B Bandyopadhyay… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, end-to-end automatic speech recognition (ASR) systems have proven
themselves remarkably accurate and performant, but these systems still have a significant …

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation

R Huang, M Yarmohammadi, S Khudanpur… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing research suggests that automatic speech recognition (ASR) models can benefit
from additional contexts (eg, contact lists, user specified vocabulary). Rare words and …

Conformer-based speech recognition on extreme edge-computing devices

M Xu, A Jin, S Wang, M Su, T Ng, H Mason… - arXiv preprint arXiv …, 2023 - arxiv.org
With increasingly more powerful compute capabilities and resources in today's devices,
traditionally compute-intensive automatic speech recognition (ASR) has been moving from …

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Z Lei, X Na, M Xu, E Pusateri, C Van Gysel… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown superb capability of modeling multimodal
signals including audio and text, allowing the model to generate spoken or textual response …

Enhancing CTC-based speech recognition with diverse modeling units

S Han, Z Lei, M Xu, X Na, Z Huang - arXiv preprint arXiv:2406.03274, 2024 - arxiv.org
In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR)
models has been remarkable, largely due to advances in deep learning architectures like …

AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition

Y Jiang, J Chen, K Han, Y Liu, S Ma, Y Song… - … Conference on Natural …, 2024 - Springer
Automatic speech recognition (ASR) is a crucial technology in the field of artificial
intelligence, widely applied in modern society. The deep learning-based ASR method offers …