An integrated tutorial on InfiniBand, verbs, and MPI

P MacArthur, Q Liu, RD Russell… - … Surveys & Tutorials, 2017 - ieeexplore.ieee.org
This tutorial presents the details of the interconnection network utilized in many high
performance computing (HPC) systems today.“InfiniBand” is the hardware interconnect …

Fail-in-place network design: interaction between topology, routing algorithm and failures

J Domke, T Hoefler, S Matsuoka - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
The growing system size of high performance computers results in a steady decrease of the
mean time between failures. Exchanging network components often requires whole system …

On the relation between congestion control, switch arbitration and fairness

EG Gran, E Zahavi, SA Reinemo… - 2011 11th IEEE/ACM …, 2011 - ieeexplore.ieee.org
In loss less interconnection networks such as InfiniBand, congestion control (CC) can be an
effective mechanism to achieve high performance and good utilization of network resources …

Making the network scalable: Inter-subnet routing in infiniband

B Bogdański, BD Johnsen, SA Reinemo… - Euro-Par 2013 Parallel …, 2013 - Springer
As InfiniBand clusters grow in size and complexity, the need arises to segment the network
into manageable sections. Up until now, InfiniBand routers have not been used extensively …

A self-adaptive network for HPC clouds: Architecture, framework, and implementation

F Zahid, A Taherkordi, EG Gran, T Skeie… - … on Parallel and …, 2018 - ieeexplore.ieee.org
Clouds offer flexible and economically attractive compute and storage solutions for
enterprises. However, the effectiveness of cloud computing for high-performance computing …

Exploring the scope of the InfiniBand congestion control mechanism

EG Gran, SA Reinemo, O Lysne, T Skeie… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
In a loss less interconnection network, network congestion needs to be detected and
resolved to ensure high performance and good utilization of network resources at high …

[PDF][PDF] Optimized routing for fat-tree topologies

B Bogdanski - submitted for the degree of Philosophy …, 2014 - web-backend.simula.no
In recent years, InfiniBand has become one of the leading interconnects for
highperformance systems. InfiniBand is not only the most popular interconnect used in the …

FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBand

C Luo, H Gu, L Zhu, H Zhang - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
According to the latest TOP500 list, InfiniBand (IB) is the most widely used network
architecture in the top 10 supercomputers. IB relies on Credit-based Flow Control (CBFC) to …

A weighted fat-tree routing algorithm for efficient load-balancing in infini band enterprise clusters

F Zahid, EG Gran, B Bogdanski… - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
Infini Band (IB) has become a popular network interconnect for high performance computing
(HPC) systems. Many of the large IB-based HPC systems use some variant of the fat-tree …

A measurement study of congestion in an InfiniBand network

F Alali, F Mizero, M Veeraraghavan… - 2017 Network Traffic …, 2017 - ieeexplore.ieee.org
This paper presents a measurement study of congestion on a production, highly utilized,
72K-core InfiniBand cluster called Yellowstone. The measurement study consists of a 23 …