Parameter hub: a rack-scale parameter server for distributed deep neural network training

H Wu, S Wang, Y Bai, C Li, Q Zhou, J Yi… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Gradient compression is a promising approach to alleviating the communication bottleneck
in data parallel deep neural network (DNN) training by significantly reducing the data …

AggTree: A Routing Tree With In-Network Aggregation for Distributed Training

J Nie, W Wu - 2023 IEEE International Performance, Computing …, 2023 - ieeexplore.ieee.org

For distributed training (DT) based on the parameter servers (PS) architecture, the
communication overhead is huge in the network for servers synchronizing parameters. In the …

被引用次数：1 相关文章所有 2 个版本

[PDF] ieee.org

BPCM: a flexible high-speed bypass parallel communication mechanism for GPU cluster

M Wu, Q Chen, J Wang - IEEE Access, 2020 - ieeexplore.ieee.org

With the increasing complexity of computational tasks faced by artificial intelligence
technology, the scale of machine learning models continues to expand, and the data volume …

被引用次数：5 相关文章所有 3 个版本

[PDF] acm.org

OmNICCL: Zero-cost Sparse AllReduce with Direct Cache Access and SmartNICs

T Gu, J Fei, M Canini - Proceedings of the 2024 SIGCOMM Workshop on …, 2024 - dl.acm.org

AllReduce is a collective communication pattern commonly used in Distributed Deep
Learning (DDL) and High Performance Computing (HPC). Sparse AllReduce, which …

Libra: Contention-Aware GPU Thread Allocation for Data Parallel Training in High Speed Networks

Y Liu, B Jiang, S Zhao, T Lin, X Wang… - IEEE INFOCOM 2023 …, 2023 - ieeexplore.ieee.org

Overlapping gradient communication with backward computation is a popular technique to
reduce communication cost in the widely adopted data parallel S-SGD training. However …

Heterogeneity-aware asynchronous decentralized training

Q Luo, J He, Y Zhuo, X Qian - arXiv preprint arXiv:1909.08029, 2019 - arxiv.org

Distributed deep learning training usually adopts All-Reduce as the synchronization
mechanism for data parallel algorithms due to its high performance in homogeneous …

被引用次数：6 相关文章所有 3 个版本

Enhancing the performance assessment of network-based and machine learning for module availability estimation

AL Challoob, AH Hussein - International Journal of System …, 2024 - inderscienceonline.com

Interpreting network telemetry data is difficult. Size and volume are network assets.
Production rises. ML predicts traffic trends to help decision-making. Classification and …

Freezepipe: An efficient dynamic pipeline parallel approach based on freezing mechanism for distributed dnn training

C Weng, Z Shu, Z Xu, J Zhang, J Luo… - … Cooperative Work in …, 2023 - ieeexplore.ieee.org

Deep Neural Network (DNN) training on a large scale is extremely time-consuming and
computationally intensive, which is accelerated by distributed training. In recent years …

被引用次数：1 相关文章

[PDF] keio.ac.jp

An in-network parameter aggregation using DPDK for multi-GPU deep learning

M Furukawa, T Itsubo… - 2020 Eighth International …, 2020 - ieeexplore.ieee.org

In distributed deep neural network using remote GPU nodes, communication occurs
iteratively between remote nodes for gradient aggregation. This communication latency …

被引用次数：4 相关文章所有 6 个版本

Understanding the performance of in-network computing: A case study

F Yang, Z Wang, X Ma, G Yuan… - 2019 IEEE Intl Conf on …, 2019 - ieeexplore.ieee.org

Numerous distributed applications, including machine learning and big data analysis, have
suffered performance degradation from network bottleneck. To solve this problem …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群