XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation

Q Zhang, G Zhao, H Xu, P Yang - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
With the growth of model/dataset/system size for distributed model training in datacenters,
the widely used Parameter Server (PS) architecture suffers from communication bottleneck …

XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation

Q Zhang, G Zhao, H Xu, P Yang - IEEE/ACM Transactions on …, 2023 - computer.org
With the growth of model/dataset/system size for distributed model training in datacenters,
the widely used Parameter Server (PS) architecture suffers from communication bottleneck …

XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation

Q Zhang, G Zhao, H Xu, P Yang - IEEE/ACM Transactions on …, 2024 - computer.org
With the growth of model/dataset/system size for distributed model training in datacenters,
the widely used Parameter Server (PS) architecture suffers from communication bottleneck …