Mixed precision training with 8-bit floating point

JO Neill - arXiv preprint arXiv:2006.03669, 2020 - arxiv.org

Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

被引用次数：117 相关文章所有 3 个版本

[PDF] royalsocietypublishing.org Full View

Stochastic rounding: implementation, error analysis and applications

M Croci, M Fasi, NJ Higham… - Royal Society Open …, 2022 - royalsocietypublishing.org

Stochastic rounding (SR) randomly maps a real number x to one of the two nearest values in
a finite precision number system. The probability of choosing either of these two numbers is …

被引用次数：54 相关文章所有 18 个版本

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

被引用次数：599 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1022 相关文章所有 4 个版本

[PDF] arxiv.org

8-bit optimizers via block-wise quantization

T Dettmers, M Lewis, S Shleifer… - arXiv preprint arXiv …, 2021 - arxiv.org

Stateful optimizers maintain gradient statistics over time, eg, the exponentially smoothed
sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can …

被引用次数：168 相关文章所有 4 个版本

[PDF] acm.org Full View

A domain-specific supercomputer for training deep neural networks

NP Jouppi, DH Yoon, G Kurian, S Li, N Patil… - Communications of the …, 2020 - dl.acm.org

A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …

被引用次数：286 相关文章所有 4 个版本

[PDF] neurips.cc

Stable and low-precision training for large-scale vision-language models

M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …

被引用次数：20 相关文章所有 6 个版本

[PDF] optica.org

Silicon microring synapses enable photonic deep learning beyond 9-bit precision

W Zhang, C Huang, HT Peng, S Bilodeau, A Jha… - Optica, 2022 - opg.optica.org

Deep neural networks (DNNs) consist of layers of neurons interconnected by synaptic
weights. A high bit-precision in weights is generally required to guarantee high accuracy in …

被引用次数：82 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] A deep context learning based PCB defect detection model with anomalous trend alarming system

JY Lim, JY Lim, VM Baskaran, X Wang - Results in Engineering, 2023 - Elsevier

The quality of a printed circuit board (PCB) is paramount towards ensuring proper
functionality of electronic products. To achieve the required quality standards, substantial …

被引用次数：26 相关文章所有 2 个版本

[PDF] siam.org

Stochastic rounding and its probabilistic backward error analysis

MP Connolly, NJ Higham, T Mary - SIAM Journal on Scientific Computing, 2021 - SIAM

Stochastic rounding rounds a real number to the next larger or smaller floating-point number
with probabilities 1 minus the relative distances to those numbers. It is gaining attention in …

被引用次数：77 相关文章所有 15 个版本

高级搜索

QQ 群