The increasing complexity of HPC systems has introduced new sources of variability, which can contribute to significant differences in run-to-run performance of applications. With …
A Khan, H Sim, SS Vazhkudai, AR Butt… - … Conference on High …, 2021 - dl.acm.org
Supercomputer design is a complex, multi-dimensional optimization process, wherein several subsystems need to be reconciled to meet a desired figure of merit performance for …
K Wen, P Samadi, S Rumley, CP Chen… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
The Dragonfly topology provides low-diameter connectivity for high-performance computing with all-to-all global links at the inter-group level. Our traffic matrix characterization of various …
Torus networks are an attractive topology in supercomputing, balancing the tradeoff between network diameter and hardware costs. The nodes in a torus network are connected …
System noise can negatively impact the performance of HPC systems, and the interconnection network is one of the main factors contributing to this problem. To mitigate …
S Jha, A Patke, J Brandt, A Gentile, B Lim… - … USENIX Symposium on …, 2020 - usenix.org
While it is widely acknowledged that network congestion in High Performance Computing (HPC) systems can significantly degrade application performance, there has been little to no …
Application performance variability caused by network contention is a major issue on dragonfly based systems. This work-in-progress study makes two contributions. First, we …
T Groves, Y Gu, NJ Wright - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
This work evaluates performance variability in the Cray Aries dragonfly network and characterizes its impact on MPI Allreduce. The execution time of Allreduce is limited by the …
Large-scale high-performance computing (HPC) systems in the form of supercomputers and warehouse-scale data centers permeate nearly every corner of modern life from applications …