Watch out for the bully! job interference study on dragonfly network

X Yang, J Jenkins, M Mubarak… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
High-radix, low-diameter dragonfly networks will be a common choice in next-generation
supercomputers. Preliminary studies show that random job placement with adaptive routing …

Run-to-run variability on Xeon Phi based Cray XC systems

S Chunduri, K Harms, S Parker, V Morozov… - Proceedings of the …, 2017 - dl.acm.org
The increasing complexity of HPC systems has introduced new sources of variability, which
can contribute to significant differences in run-to-run performance of applications. With …

An analysis of system balance and architectural trends based on top500 supercomputers

A Khan, H Sim, SS Vazhkudai, AR Butt… - … Conference on High …, 2021 - dl.acm.org
Supercomputer design is a complex, multi-dimensional optimization process, wherein
several subsystems need to be reconciled to meet a desired figure of merit performance for …

Flexfly: Enabling a reconfigurable dragonfly through silicon photonics

K Wen, P Samadi, S Rumley, CP Chen… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
The Dragonfly topology provides low-diameter connectivity for high-performance computing
with all-to-all global links at the inter-group level. Our traffic matrix characterization of various …

Visualizing the topology and data traffic of multi-dimensional torus interconnect networks

S Cheng, W Zhong, KE Isaacs, K Mueller - IEEE Access, 2018 - ieeexplore.ieee.org
Torus networks are an attractive topology in supercomputing, balancing the tradeoff
between network diameter and hardware costs. The nodes in a torus network are connected …

Mitigating network noise on dragonfly networks through application-aware routing

D De Sensi, S Di Girolamo, T Hoefler - Proceedings of the International …, 2019 - dl.acm.org
System noise can negatively impact the performance of HPC systems, and the
interconnection network is one of the main factors contributing to this problem. To mitigate …

Measuring Congestion in {High-Performance} Datacenter Interconnects

S Jha, A Patke, J Brandt, A Gentile, B Lim… - … USENIX Symposium on …, 2020 - usenix.org
While it is widely acknowledged that network congestion in High Performance Computing
(HPC) systems can significantly degrade application performance, there has been little to no …

The effect of system utilization on application performance variability

B Li, S Chunduri, K Harms, Y Fan, Z Lan - Proceedings of the 9th …, 2019 - dl.acm.org
Application performance variability caused by network contention is a major issue on
dragonfly based systems. This work-in-progress study makes two contributions. First, we …

Understanding performance variability on the aries dragonfly network

T Groves, Y Gu, NJ Wright - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
This work evaluates performance variability in the Cray Aries dragonfly network and
characterizes its impact on MPI Allreduce. The execution time of Allreduce is limited by the …

Optical interconnection networks for high-performance systems

Q Cheng, M Glick, K Bergman - Optical fiber telecommunications VII, 2020 - Elsevier
Large-scale high-performance computing (HPC) systems in the form of supercomputers and
warehouse-scale data centers permeate nearly every corner of modern life from applications …