New solutions on LLM acceleration, optimization, and application

Y Huang, LJ Wan, H Ye, M Jha, J Wang, Y Li… - Proceedings of the 61st …, 2024 - dl.acm.org
Large Language Models (LLMs) have revolutionized a wide range of applications with their
strong human-like understanding and creativity. Due to the continuously growing model size …

Retrieval augmented correction of named entity speech recognition errors

E Pusateri, A Walia, A Kashi, B Bandyopadhyay… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, end-to-end automatic speech recognition (ASR) systems have proven
themselves remarkably accurate and performant, but these systems still have a significant …

Personalization of ctc-based end-to-end speech recognition using pronunciation-driven subword tokenization

Z Lei, E Pusateri, S Han, L Liu, M Xu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Recent advances in deep learning and automatic speech recognition have improved the
accuracy of end-to-end speech recognition systems, but recognition of personal content …

Conformer-based speech recognition on extreme edge-computing devices

M Xu, A Jin, S Wang, M Su, T Ng, H Mason… - arXiv preprint arXiv …, 2023 - arxiv.org
With increasingly more powerful compute capabilities and resources in today's devices,
traditionally compute-intensive automatic speech recognition (ASR) has been moving from …

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Z Lei, X Na, M Xu, E Pusateri, C Van Gysel… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown superb capability of modeling multimodal
signals including audio and text, allowing the model to generate spoken or textual response …

[HTML][HTML] Multi-Head Attention-Enhanced Speech Recognition for Reduced Data Requirements

Y Li, Y Zhou, Z Qiu, Y Wang, J Wang, G Huang - Electronics, 2024 - mdpi.com
Automatic speech recognition (ASR) technology has reached a mature level, and improving
performance in data-scarce scenarios has become a key research focus. In this study, we …

Enhancing CTC-based speech recognition with diverse modeling units

S Han, Z Lei, M Xu, X Na, Z Huang - arXiv preprint arXiv:2406.03274, 2024 - arxiv.org
In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR)
models has been remarkable, largely due to advances in deep learning architectures like …

Transformer-based Model for ASR N-Best Rescoring and Rewriting

IE Kang, C Van Gysel, MH Siu - arXiv preprint arXiv:2406.08207, 2024 - arxiv.org
Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure
speed and privacy. However, due to resource constraints on the device, queries pertaining …

Generating Image Adversarial Example by Modifying JPEG Stream

J Zhao, S Qiao, J Wang, G Liu - … on Computer Vision and Deep Learning, 2024 - dl.acm.org
Research on adversarial attacks mainly focuses on reducing the amplitude of disturbances,
increasing the success rate of attacks, and improving attack efficiency. However, the …

Automatic Speech Recognition (ASR) Performance in Noisy Conditions; A Study on Norwegian Police Interviews

K Mølnvik - 2024 - ntnuopen.ntnu.no
This master's thesis investigates the practical application of Automatic Speech Recognition
(ASR) technology, namely the Whisper model, for transcribing Norwegian police interviews …