作者
Peng Sun, Yonggang Wen, Ruobing Han, Wansen Feng, Shengen Yan
发表日期
2019/12/4
期刊
IEEE Transactions on Big Data
卷号
8
期号
2
页码范围
495-507
出版商
IEEE
简介
It is important to scale out deep neural network (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our investigations have shown that popular open-source DNN systems could only achieve 2.5 speedup ratio on 64 GPUs connected by 56 Gbps network. To address this problem, we propose a communication backend named GradientFlow for distributed DNN training, and employ a set of network optimization techniques. First, we integrate ring-based allreduce, mixed-precision training, and computation/communication overlap into GradientFlow. Second, we propose lazy allreduce to improve network throughput by fusing multiple communication operations into a single one, and design coarse-grained sparse communication to reduce network traffic by only transmitting important gradient …
引用总数
2020202120222023202447575
学术搜索中的文章