An overview of neural network compression

JO Neill - arXiv preprint arXiv:2006.03669, 2020 - arxiv.org
Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

CTformer: convolution-free Token2Token dilated vision transformer for low-dose CT denoising

D Wang, F Fan, Z Wu, R Liu, F Wang… - Physics in Medicine & …, 2023 - iopscience.iop.org
Objective. Low-dose computed tomography (LDCT) denoising is an important problem in CT
research. Compared to the normal dose CT, LDCT images are subjected to severe noise …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Audio albert: A lite bert for self-supervised learning of audio representation

PH Chi, PH Chung, TH Wu, CC Hsieh… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Self-supervised speech models are powerful speech representation extractors for
downstream applications. Recently, larger models have been utilized in acoustic model …

ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration

X Yang, B Yan, H Li, Y Chen - … of the 39th International Conference on …, 2020 - dl.acm.org
Transformer has emerged as a popular deep neural network (DNN) model for Neural
Language Processing (NLP) applications and demonstrated excellent performance in …

Lessons on parameter sharing across layers in transformers

S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The
proposed approach relaxes a widely used technique, which shares parameters for one layer …

Masked language modeling for proteins via linearly scalable long-context transformers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer models have achieved state-of-the-art results across a diverse range of
domains. However, concern over the cost of training the attention mechanism to learn …

Losing Heads in the Lottery: Pruning Transformer

M Behnke, K Heafield - The 2020 Conference on Empirical …, 2020 - research.ed.ac.uk
The attention mechanism is the crucial component of the transformer architecture. Recent
research shows that most attention heads are not confident in their decisions and can be …