Data center network virtualization: A survey

MF Bari, R Boutaba, R Esteves… - … surveys & tutorials, 2012 - ieeexplore.ieee.org
With the growth of data volumes and variety of Internet applications, data centers (DCs) have
become an efficient and promising infrastructure for supporting data storage, and providing …

A survey on data center networking (DCN): Infrastructure and operations

W Xia, P Zhao, Y Wen, H Xie - IEEE communications surveys & …, 2016 - ieeexplore.ieee.org
Data centers (DCs), owing to the exponential growth of Internet services, have emerged as
an irreplaceable and crucial infrastructure to power this ever-growing trend. A DC typically …

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

HPCC: High precision congestion control

Y Li, R Miao, HH Liu, Y Zhuang, F Feng… - Proceedings of the …, 2019 - dl.acm.org
Congestion control (CC) is the key to achieving ultra-low latency, high bandwidth and
network stability in high-speed networks. From years of experience operating large-scale …

Congestion control for large-scale RDMA deployments

Y Zhu, H Eran, D Firestone, C Guo… - ACM SIGCOMM …, 2015 - dl.acm.org
Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (<
10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot …

Re-architecting datacenter networks and stacks for low latency and high performance

M Handley, C Raiciu, A Agache, A Voinescu… - Proceedings of the …, 2017 - dl.acm.org
Modern datacenter networks provide very high capacity via redundant Clos topologies and
low switch latency, but transport protocols rarely deliver matching performance. We present …

RDMA over commodity ethernet at scale

C Guo, H Wu, Z Deng, G Soni, J Ye, J Padhye… - Proceedings of the …, 2016 - dl.acm.org
Over the past one and half years, we have been using RDMA over commodity Ethernet
(RoCEv2) to support some of Microsoft's highly-reliable, latency-sensitive services. This …

TIMELY: RTT-based congestion control for the datacenter

R Mittal, VT Lam, N Dukkipati, E Blem… - ACM SIGCOMM …, 2015 - dl.acm.org
Datacenter transports aim to deliver low latency messaging together with high throughput.
We show that simple packet delay, measured as round-trip times at hosts, is an effective …

CONGA: Distributed congestion-aware load balancing for datacenters

M Alizadeh, T Edsall, S Dharmapurikar… - Proceedings of the …, 2014 - dl.acm.org
We present the design, implementation, and evaluation of CONGA, a network-based
distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits …

Language-directed hardware design for network performance monitoring

S Narayana, A Sivaraman, V Nathan, P Goyal… - Proceedings of the …, 2017 - dl.acm.org
Network performance monitoring today is restricted by existing switch support for
measurement, forcing operators to rely heavily on endpoints with poor visibility into the …