LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

J Lim, Y Kwon, R Hwang, K Maeng, E Suh… - Proceedings of the 29th …, 2024 - dl.acm.org
Differential privacy (DP) is widely being employed in the industry as a practical standard for
privacy protection. While private training of computer vision or natural language processing …

NetReduce: RDMA-compatible in-network reduction for distributed DNN training acceleration

S Liu, Q Wang, J Zhang, Q Lin, Y Liu, M Xu… - arXiv preprint arXiv …, 2020 - arxiv.org
We present NetReduce, a novel RDMA-compatible in-network reduction architecture to
accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a …

Elastic model aggregation with parameter service

J Gu, M Chowdhury, KG Shin, A Akella - arXiv preprint arXiv:2204.03211, 2022 - arxiv.org
Model aggregation, the process that updates model parameters, is an important step for
model convergence in distributed deep learning (DDL). However, the parameter server (PS) …

A generic service to provide in-network aggregation for key-value streams

Y He, W Wu, Y Le, M Liu, CL Lao - Proceedings of the 28th ACM …, 2023 - dl.acm.org
Key-value stream aggregation is a common operation in distributed systems, which requires
intensive computation and network resources. We propose a generic in-network …

SOAR: Minimizing Network Utilization Cost With Bounded In-Network Computing

R Segal, C Avin, G Scalosub - IEEE Transactions on Network …, 2023 - ieeexplore.ieee.org
In-network computing via smart networking devices is a recent trend in modern datacenter
networks. State-of-the-art switches with near line-rate computing and aggregation …

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

J Zhang, J Zhan, J Li, J Jin… - … and Computation: Practice …, 2020 - Wiley Online Library
Exorbitant resources (computing and memory) are required to train a deep neural network
(DNN). Often researchers deploy an approach that uses distributed parallel training to …

Chasing similarity: Distribution-aware aggregation scheduling

F Liu, A Salmasi, S Blanas, A Sidiropoulos - Proceedings of the VLDB …, 2018 - dl.acm.org
Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP
BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an …

Enabling efficient large-scale deep learning training with cache coherent disaggregated memory systems

Z Wang, J Sim, E Lim, J Zhao - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Modern deep learning (DL) training is memory-consuming, constrained by the memory
capacity of each computation component and cross-device communication bandwidth. In …

InArt: In-Network Aggregation with Route Selection for Accelerating Distributed Training

J Liu, Y Zhai, G Zhao, H Xu, J Fang, Z Zeng… - Proceedings of the ACM …, 2024 - dl.acm.org
Deep learning has brought about a revolutionary transformation in network applications,
particularly in domains like e-commerce and online advertising. Distributed training (DT), as …

Bytecomp: Revisiting gradient compression in distributed training

Z Wang, H Lin, Y Zhu, TS Ng - arXiv preprint arXiv:2205.14465, 2022 - arxiv.org
Gradient compression (GC) is a promising approach to addressing the communication
bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal …