Watch out for the bully! job interference study on dragonfly network

X Yang, J Jenkins, M Mubarak… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
High-radix, low-diameter dragonfly networks will be a common choice in next-generation
supercomputers. Preliminary studies show that random job placement with adaptive routing …

Experience and practice of batch scheduling on leadership supercomputers at argonne

W Allcock, P Rich, Y Fan, Z Lan - … , JSSPP 2017, Orlando, FL, USA, June 2 …, 2018 - Springer
The mission of the DOE Argonne Leadership Computing Facility (ALCF) is to accelerate
major scientific discoveries and engineering breakthroughs for humanity by designing and …

Evaluation of an interference-free node allocation policy on fat-tree clusters

SD Pollard, N Jain, S Herbein… - … Conference for High …, 2018 - ieeexplore.ieee.org
Interference between jobs competing for network bandwidth on a fat-tree cluster can cause
significant variability and degradation in performance. These performance issues can be …

Performance optimality or reproducibility: that is the question

T Patki, JJ Thiagarajan, A Ayala, TZ Islam - Proceedings of the …, 2019 - dl.acm.org
The era of extremely heterogeneous supercomputing brings with itself the devil of increased
performance variation and reduced reproducibility. There is a lack of understanding in the …

Evaluating quality of service traffic classes on the megafly network

M Mubarak, N McGlohon, M Musleh, E Borch… - … Conference, ISC High …, 2019 - Springer
An emerging trend in High Performance Computing (HPC) systems that use hierarchical
topologies (such as dragonfly) is that the applications are increasingly exhibiting high run-to …

Overcoming hadoop scaling limitations through distributed task execution

K Wang, N Liu, I Sadooghi, X Yang… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
Data driven programming models like MapReduce have gained the popularity in large-scale
data processing. Although great efforts through the Hadoop implementation and framework …

Trade-off study of localizing communication and balancing network traffic on a dragonfly system

X Wang, M Mubarak, X Yang… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Dragonfly networks are being widely adopted in high-performance computing systems. On
these networks, however, interference caused by resource sharing can lead to significant …

A convergence of key‐value storage systems from clouds to supercomputers

T Li, X Zhou, K Wang, D Zhao… - Concurrency and …, 2016 - Wiley Online Library
This paper presents a convergence of distributed key‐value storage systems in clouds and
supercomputers. It specifically presents ZHT, a zero‐hop distributed key‐value store system …

Joint effects of application communication pattern, job placement and network routing on fat-tree systems

P Qiao, X Wang, X Yang, Y Fan, Z Lan - Workshop Proceedings of the …, 2018 - dl.acm.org
Among the high-radix and low-diameter networks, fat-tree topology is commonly used in
high-performance computing (HPC) and datacenter systems. Resource and job …

An analysis of long-tailed network latency distribution and background traffic on dragonfly+

M Salimi Beni, B Cosenza - International Symposium on Benchmarking …, 2022 - Springer
Modern computing systems are highly affected by large performance variability, resulting in
a long tail in the distribution of the network latency. For communication-intensive …