作者
Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna
发表日期
2022/8/17
研讨会论文
2022 IEEE Symposium on High-Performance Interconnects (HOTI)
页码范围
39-48
出版商
IEEE
简介
Ahstract-RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), suffers from many drawbacks such as unfairness, head-of-line-blocking, and deadlock. Therefore, in recent years many schemes have been proposed to provide additional congestion control for RoCE networks to minimize PFC drawbacks. However, these schemes are proposed for general datacenter environments. In contrast to the general datacenters that are built using commodity hardware and run general-purpose workloads, high-performance distributed training platforms deploy high-end accelerators …
引用总数
学术搜索中的文章
T Khan, S Rashidi, S Sridharan, P Shurpali, A Akella… - 2022 IEEE Symposium on High-Performance …, 2022