A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

Scaling distributed machine learning with {In-Network} aggregation

A Sapio, M Canini, CY Ho, J Nelson, P Kalnis… - … USENIX Symposium on …, 2021 - usenix.org
Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

{ATP}: In-network aggregation for multi-tenant learning

CL Lao, Y Le, K Mahajan, Y Chen, W Wu… - … USENIX Symposium on …, 2021 - usenix.org
Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …

In-network aggregation for data center networks: A survey

A Feng, D Dong, F Lei, J Ma, E Yu, R Wang - Computer Communications, 2023 - Elsevier
Aggregation applications are widely deployed in data centers, such as distributed machine
learning and MapReduce-like framework. These applications typically have large …

Distributed hierarchical gpu parameter server for massive scale deep learning ads systems

W Zhao, D Xie, R Jia, Y Qian, R Ding… - … of Machine Learning …, 2020 - proceedings.mlsys.org
Neural networks of ads systems usually take input from multiple resources, eg query-ad
relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot …

Grace: A compressed communication framework for distributed machine learning

H Xu, CY Ho, AM Abdelmoniem, A Dutta… - 2021 IEEE 41st …, 2021 - ieeexplore.ieee.org
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training increasingly becomes communication bound …

Priority-based parameter propagation for distributed DNN training

A Jayarajan, J Wei, G Gibson… - Proceedings of …, 2019 - proceedings.mlsys.org
Data parallel training is widely used for scaling distributed deep neural network (DNN)
training. However, the performance benefits are often limited by the communication-heavy …

Efficient sparse collective communication and its application to accelerate distributed deep learning

J Fei, CY Ho, AN Sahu, M Canini, A Sapio - Proceedings of the 2021 …, 2021 - dl.acm.org
Efficient collective communication is crucial to parallel-computing applications such as
distributed training of large-scale recommendation systems and natural language …

Accelerating decentralized federated learning in heterogeneous edge computing

L Wang, Y Xu, H Xu, M Chen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
In edge computing (EC), federated learning (FL) enables massive devices to collaboratively
train AI models without exposing local data. In order to avoid the possible bottleneck of the …