Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking

L Poutievski, O Mashayekhi, J Ong, A Singh… - Proceedings of the …, 2022 - dl.acm.org
We present a decade of evolution and production experience with Jupiter datacenter
network fabrics. In this period Jupiter has delivered 5x higher speed and capacity, 30 …

Clio: A hardware-software co-designed disaggregated memory system

Z Guo, Y Shan, X Luo, Y Huang, Y Zhang - Proceedings of the 27th ACM …, 2022 - dl.acm.org
Memory disaggregation has attracted great attention recently because of its benefits in
efficient memory utilization and ease of management. So far, memory disaggregation …

{MegaScale}: Scaling large language model training to more than 10,000 {GPUs}

Z Jiang, H Lin, Y Zhong, Q Huang, Y Chen… - … USENIX Symposium on …, 2024 - usenix.org
We present the design, implementation and engineering experience in building and
deploying MegaScale, a production system for training large language models (LLMs) at the …

One-way delay measurement from traditional networks to sdn: A survey

D Chefrour - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
We expose the state of the art in the topic of one-way delay measurement in both traditional
and software-defined networks. A representative range of standard mechanisms and recent …

[HTML][HTML] Empowering azure storage with {RDMA}

W Bai, SS Abdeen, A Agrawal, KK Attre, P Bahl… - … USENIX Symposium on …, 2023 - usenix.org
Given the wide adoption of disaggregated storage in public clouds, networking is the key to
enabling high performance and high reliability in a cloud storage service. In Azure, we …

Programmable packet scheduling with a single queue

Z Yu, C Hu, J Wu, X Sun, V Braverman… - Proceedings of the …, 2021 - dl.acm.org
Programmable packet scheduling enables scheduling algorithms to be programmed into the
data plane without changing the hardware. Existing proposals either have no hardware …

ACC: Automatic ECN tuning for high-speed datacenter networks

S Yan, X Wang, X Zheng, Y Xia, D Liu… - Proceedings of the 2021 …, 2021 - dl.acm.org
For the widely deployed ECN-based congestion control schemes, the marking threshold is
the key to deliver high bandwidth and low latency. However, due to traffic dynamics in the …

{PowerTCP}: Pushing the performance limits of datacenter networks

V Addanki, O Michel, S Schmid - 19th USENIX symposium on networked …, 2022 - usenix.org
Increasingly stringent throughput and latency requirements in datacenter networks demand
fast and accurate congestion control. We observe that the reaction time and accuracy of …

{eZNS}: An elastic zoned namespace for commodity {ZNS}{SSDs}

J Min, C Zhao, M Liu, A Krishnamurthy - 17th USENIX Symposium on …, 2023 - usenix.org
Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction,
hold the potential to significantly enhance the cost-efficiency of future storage infrastructure …

1rma: Re-envisioning remote memory access for multi-tenant datacenters

A Singhvi, A Akella, D Gibson, TF Wenisch… - Proceedings of the …, 2020 - dl.acm.org
Remote Direct Memory Access (RDMA) plays a key role in supporting performance-hungry
datacenter applications. However, existing RDMA technologies are ill-suited to multi-tenant …