Parameter hub: a rack-scale parameter server for distributed deep neural network training

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

J Lim, Y Kwon, R Hwang, K Maeng, E Suh… - Proceedings of the 29th …, 2024 - dl.acm.org

Differential privacy (DP) is widely being employed in the industry as a practical standard for
privacy protection. While private training of computer vision or natural language processing …

NetReduce: RDMA-compatible in-network reduction for distributed DNN training acceleration

S Liu, Q Wang, J Zhang, Q Lin, Y Liu, M Xu… - arXiv preprint arXiv …, 2020 - arxiv.org

We present NetReduce, a novel RDMA-compatible in-network reduction architecture to
accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Elastic model aggregation with parameter service

J Gu, M Chowdhury, KG Shin, A Akella - arXiv preprint arXiv:2204.03211, 2022 - arxiv.org

Model aggregation, the process that updates model parameters, is an important step for
model convergence in distributed deep learning (DDL). However, the parameter server (PS) …

被引用次数：4 相关文章所有 2 个版本

[PDF] nsf.gov

A generic service to provide in-network aggregation for key-value streams

Y He, W Wu, Y Le, M Liu, CL Lao - Proceedings of the 28th ACM …, 2023 - dl.acm.org

Key-value stream aggregation is a common operation in distributed systems, which requires
intensive computation and network resources. We propose a generic in-network …

被引用次数：12 相关文章所有 4 个版本

SOAR: Minimizing Network Utilization Cost With Bounded In-Network Computing

R Segal, C Avin, G Scalosub - IEEE Transactions on Network …, 2023 - ieeexplore.ieee.org

In-network computing via smart networking devices is a recent trend in modern datacenter
networks. State-of-the-art switches with near line-rate computing and aggregation …

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

J Zhang, J Zhan, J Li, J Jin… - … and Computation: Practice …, 2020 - Wiley Online Library

Exorbitant resources (computing and memory) are required to train a deep neural network
(DNN). Often researchers deploy an approach that uses distributed parallel training to …

被引用次数：9 相关文章

[PDF] nsf.gov

Chasing similarity: Distribution-aware aggregation scheduling

F Liu, A Salmasi, S Blanas, A Sidiropoulos - Proceedings of the VLDB …, 2018 - dl.acm.org

Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP
BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an …

被引用次数：13 相关文章所有 6 个版本

[PDF] nsf.gov

Enabling efficient large-scale deep learning training with cache coherent disaggregated memory systems

Z Wang, J Sim, E Lim, J Zhao - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Modern deep learning (DL) training is memory-consuming, constrained by the memory
capacity of each computation component and cross-device communication bandwidth. In …

被引用次数：10 相关文章所有 5 个版本

[PDF] github.io

InArt: In-Network Aggregation with Route Selection for Accelerating Distributed Training

J Liu, Y Zhai, G Zhao, H Xu, J Fang, Z Zeng… - Proceedings of the ACM …, 2024 - dl.acm.org

Deep learning has brought about a revolutionary transformation in network applications,
particularly in domains like e-commerce and online advertising. Distributed training (DT), as …

Bytecomp: Revisiting gradient compression in distributed training

Z Wang, H Lin, Y Zhu, TS Ng - arXiv preprint arXiv:2205.14465, 2022 - arxiv.org

Gradient compression (GC) is a promising approach to addressing the communication
bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群