查看文章

osti.gov 中的 [PDF]

Fine-Grained Exploitation of Mixed Precision for Faster CNN Training

作者

Jeremy T Johnston, Steven R Young, Catherine D Schuman, Junghoon Chae, Don D March, Robert M Patton, Thomas E Potok

发表日期

2019/11/18

研讨会论文

2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)

页码范围

9-18

出版商

IEEE

简介

As deep convolutional neural networks (CNNs) have become increasingly popular and successful at an ever-widening number of machine learning tasks specialized hardware has become increasingly available for training and deploying them. NVIDIA's recent Volta architecture includes tensor cores which perform a fused operation reduced and mixed precision (16-bit multiply, 32-bit accumulate). Recent research indicates that, typically, very little is lost (in terms of training accuracy) when half precision is used in place of single precision, and performance gains can be made by doing arithmetic in reduced precision. In this work we demonstrate that making layer-by-layer choices as to the arithmetic/data precision can lead to further performance improvement. In our study of 25,200 CNNs we demonstrate an average speedup (over purely half precision) of 1.27x and speedups as high as 3.64x by appropriately …

引用总数

被引用次数：6

2021202220233 1 2

学术搜索中的文章

Fine-grained exploitation of mixed precision for faster CNN training

JT Johnston, SR Young, CD Schuman, J Chae… - 2019 IEEE/ACM Workshop on Machine Learning in …, 2019

被引用次数：6 相关文章所有 5 个版本