查看文章

Timed dataflow: Reducing communication overhead for distributed machine learning systems

作者

Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Shengen Yan

发表日期

2016/12/13

研讨会论文

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)

页码范围

1110-1117

出版商

IEEE

简介

Many distributed machine learning (ML) systems exhibit high communication overhead when dealing with big data sets. Our investigations showed that popular distributed ML systems could spend about an order of magnitude more time on network communication than computation to train ML models containing millions of parameters. Such high communication overhead is mainly caused by two operations: pulling parameters and pushing gradients. In this paper, we propose an approach called Timed Dataflow (TDF) to deal with this problem via reducing network traffic using three techniques: a timed parameter storage system, a hybrid parameter filter and a hybrid gradient filter. In particular, the timed parameter storage technique and the hybrid parameter filter enable servers to discard unchanged parameters during the pull operation, and the hybrid gradient filter allows servers to drop gradients selectively during …

引用总数

被引用次数：19

201720182019202020212022202320241 2 6 4 3 2

学术搜索中的文章

Timed dataflow: Reducing communication overhead for distributed machine learning systems

P Sun, Y Wen, TNB Duong, S Yan - 2016 IEEE 22nd International Conference on Parallel …, 2016

被引用次数：19 相关文章所有 5 个版本