Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

4-bit conformer with native quantization aware training for speech recognition

S Ding, P Meadowlark, Y He, L Lew, S Agrawal… - arXiv preprint arXiv …, 2022 - arxiv.org
Reducing the latency and model size has always been a significant research problem for
live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model …

2-bit conformer quantization for automatic speech recognition

O Rybakov, P Meadowlark, S Ding, D Qiu, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Large speech models are rapidly gaining traction in research community. As a result, model
compression has become an important topic, so that these models can fit in memory and be …

Sub-8-bit quantization for on-device speech recognition: A regularization-free approach

K Zhen, M Radfar, H Nguyen, GP Strimel… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
For on-device automatic speech recognition (ASR), quantization aware training (QAT) is
ubiquitous to achieve the trade-off between model predictive performance and efficiency …

Rand: Robustness aware norm decay for quantized seq2seq models

D Qiu, D Rim, S Ding, O Rybakov, Y He - arXiv preprint arXiv:2305.15536, 2023 - arxiv.org
With the rapid increase in the size of neural networks, model compression has become an
important area of research. Quantization is an effective technique at decreasing the model …

The Role of Feature Correlation on Quantized Neural Networks

D Qiu, S Ding, Y He - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
With the growing need for large models in speech recognition, quantization has become a
valuable technique to reduce their compute and memory transfer costs. Quantized models …

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

E Fish, U Michieli, M Ozay - arXiv preprint arXiv:2307.12659, 2023 - arxiv.org
Recent advancement in Automatic Speech Recognition (ASR) has produced large AI
models, which become impractical for deployment in mobile devices. Model quantization is …

[PDF][PDF] Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus

Z Li, T Wang, J Deng, J Xu, S Hu, X Liu - 2023 - isca-archive.org
State-of-the-art end-to-end automatic speech recognition (ASR) systems are becoming
increasingly complex and expensive for practical applications. This paper develops a high …

Espnet-ONNX: Bridging a gap between research and production

M Someki, Y Higuchi, T Hayashi… - 2022 Asia-Pacific …, 2022 - ieeexplore.ieee.org
In the field of deep learning, researchers often focus on inventing novel neural network
models and improving benchmarks. In contrast, application developers are interested in …

En-hacn: Enhancing hybrid architecture with fast attention and capsule network for end-to-end speech recognition

B Lyu, C Fan, Y Ming, P Zhao… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Automatic speech recognition (ASR) is a fundamental technology in the field of artificial
intelligence. End-to-end (E2E) ASR is favored for its state-of-the-art performance. However …