S Liu, Q Wang, J Zhang, Q Lin, Y Liu, M Xu… - arXiv preprint arXiv …, 2020 - arxiv.org
We present NetReduce, a novel RDMA-compatible in-network reduction architecture to accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a …
Model aggregation, the process that updates model parameters, is an important step for model convergence in distributed deep learning (DDL). However, the parameter server (PS) …
Key-value stream aggregation is a common operation in distributed systems, which requires intensive computation and network resources. We propose a generic in-network …
In-network computing via smart networking devices is a recent trend in modern datacenter networks. State-of-the-art switches with near line-rate computing and aggregation …
J Zhang, J Zhan, J Li, J Jin… - … and Computation: Practice …, 2020 - Wiley Online Library
Exorbitant resources (computing and memory) are required to train a deep neural network (DNN). Often researchers deploy an approach that uses distributed parallel training to …
Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an …
Z Wang, J Sim, E Lim, J Zhao - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Modern deep learning (DL) training is memory-consuming, constrained by the memory capacity of each computation component and cross-device communication bandwidth. In …
Deep learning has brought about a revolutionary transformation in network applications, particularly in domains like e-commerce and online advertising. Distributed training (DT), as …
Z Wang, H Lin, Y Zhu, TS Ng - arXiv preprint arXiv:2205.14465, 2022 - arxiv.org
Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal …