An overview of neural network compression

JO Neill - arXiv preprint arXiv:2006.03669, 2020 - arxiv.org
Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

Stochastic rounding: implementation, error analysis and applications

M Croci, M Fasi, NJ Higham… - Royal Society Open …, 2022 - royalsocietypublishing.org
Stochastic rounding (SR) randomly maps a real number x to one of the two nearest values in
a finite precision number system. The probability of choosing either of these two numbers is …

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

8-bit optimizers via block-wise quantization

T Dettmers, M Lewis, S Shleifer… - arXiv preprint arXiv …, 2021 - arxiv.org
Stateful optimizers maintain gradient statistics over time, eg, the exponentially smoothed
sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can …

A domain-specific supercomputer for training deep neural networks

NP Jouppi, DH Yoon, G Kurian, S Li, N Patil… - Communications of the …, 2020 - dl.acm.org
A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …

Stable and low-precision training for large-scale vision-language models

M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc
We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …

Silicon microring synapses enable photonic deep learning beyond 9-bit precision

W Zhang, C Huang, HT Peng, S Bilodeau, A Jha… - Optica, 2022 - opg.optica.org
Deep neural networks (DNNs) consist of layers of neurons interconnected by synaptic
weights. A high bit-precision in weights is generally required to guarantee high accuracy in …

[HTML][HTML] A deep context learning based PCB defect detection model with anomalous trend alarming system

JY Lim, JY Lim, VM Baskaran, X Wang - Results in Engineering, 2023 - Elsevier
The quality of a printed circuit board (PCB) is paramount towards ensuring proper
functionality of electronic products. To achieve the required quality standards, substantial …

Stochastic rounding and its probabilistic backward error analysis

MP Connolly, NJ Higham, T Mary - SIAM Journal on Scientific Computing, 2021 - SIAM
Stochastic rounding rounds a real number to the next larger or smaller floating-point number
with probabilities 1 minus the relative distances to those numbers. It is gaining attention in …