High-throughput and flexible host networking for accelerated computing

A Skiadopoulos, Z Xie, M Zhao, Q Cai… - … USENIX Symposium on …, 2024 - usenix.org
Modern network hardware is able to meet the stringent bandwidth demands of applications
like GPU-accelerated AI. However, existing host network stacks offer a hard tradeoff …

RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications

Y Yuan, J Huang, Y Sun, T Wang… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Responding to the" datacenter tax" and" killer microseconds" problems for memory-intensive
datacenter applications, diverse solutions including Smart NIC-based ones have been …

Network load balancing with in-network reordering support for rdma

CH Song, XZ Khooi, R Joshi, I Choi, J Li… - Proceedings of the ACM …, 2023 - dl.acm.org
Remote Direct Memory Access (RDMA) is widely used in high-performance computing
(HPC) and data center networks. In this paper, we first show that RDMA does not work well …

Cepheus: accelerating datacenter applications with high-performance RoCE-capable multicast

W Li, J Zhang, Y Liu, G Zeng, Z Wang… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Modern datacenter applications widely exhibit multicast communication patterns.
Meanwhile, RDMA is emerging as the de-facto networking architecture to meet the stringent …

Towards fine-grained and practical flow control for datacenter networks

W Li, C Zeng, J Hu, K Chen - 2023 IEEE 31st International …, 2023 - ieeexplore.ieee.org
As datacenter networks continue to support a wider range of applications and faster link
speeds, they face the challenge of managing bursty traffic and transient congestion. End-to …

[PDF][PDF] Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Parameters

W Wang, M Ghobadi, K Shakeri… - arXiv preprint arXiv …, 2023 - people.csail.mit.edu
This paper challenges the well-established paradigm for building any-to-any networks for
training Large Language Models (LLMs). We show that LLMs exhibit a unique …

Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression

W Han, S Vargaftik, M Mitzenmacher, B Karp… - Proceedings of the 23rd …, 2024 - dl.acm.org
Gradient aggregation has long been identified as a major bottleneck in today's large-scale
distributed machine learning training systems. One promising solution to mitigate such …

RDMA Transports in Datacenter Networks: Survey

J Hu, H Shen, X Liu, J Wang - IEEE Network, 2024 - ieeexplore.ieee.org
Remote Direct Memory Access (RDMA) has become an important building block of modern
datacenter network (DCN) infrastructure given the merits of kernel bypass, zero memory …

LEFT: LightwEight and FasT packet Reordering for RDMA

P Huang, X Zhang, Z Chen, C Liu, G Chen - Proceedings of the 8th Asia …, 2024 - dl.acm.org
RDMA, as a cutting-edge networking technology, has gained extensive adoption in large-
scale data centers due to its exceptional characteristics, such as low and stable latency, high …

PPT: A Pragmatic Transport for Datacenters

L Suo, Y Pang, W Li, R Pei, K Li, X Liu, X He… - Proceedings of the …, 2024 - dl.acm.org
This paper introduces PPT, a pragmatic transport that achieves comparable performance to
proactive transports while maintaining good deployability as reactive transports. Our key …