Predictive performance modeling for distributed batch processing using black box monitoring and machine learning

C Witt, M Bux, W Gusew, U Leser - Information Systems, 2019 - Elsevier
In many domains, the previous decade was characterized by increasing data volumes and
growing complexity of data analyses, creating new demands for batch processing on …

Machine learning the computational cost of quantum chemistry

S Heinen, M Schwilk, GF von Rudorff… - Machine Learning …, 2020 - iopscience.iop.org
Computational quantum mechanics based molecular and materials design campaigns
consume increasingly more high-performance computer resources, making improved job …

HPC I/O throughput bottleneck analysis with explainable local models

M Isakov, E Del Rosario, S Madireddy… - … Conference for High …, 2020 - ieeexplore.ieee.org
With the growing complexity of high-performance computing (HPC) systems, achieving high
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …

[PDF][PDF] Machine learning predictions for underestimation of job runtime on HPC system

J Guo, A Nomura, R Barton, H Zhang… - … Frontiers: 4th Asian …, 2018 - library.oapen.org
In modern high-performance computing (HPC) systems, users are usually requested to
estimate the job runtime for system scheduling when they submit a job. In general, an …

AI4IO: A suite of AI-based tools for IO-aware scheduling

MR Wyatt, S Herbein, T Gamblin… - … International Journal of …, 2022 - journals.sagepub.com
Traditional workload managers do not have the capacity to consider how IO contention can
increase job runtime and even cause entire resource allocations to be wasted. Whether from …

HPC workload characterization using feature selection and clustering

J Bang, C Kim, K Wu, A Sim, S Byna, S Kim… - Proceedings of the 3rd …, 2020 - dl.acm.org
Large high-performance computers (HPC) are expensive tools responsible for supporting
thousands of scientific applications. However, it is not easy to determine the best set of …

An SMDP approach for Reinforcement Learning in HPC cluster schedulers

RL de Freitas Cunha, L Chaimowicz - Future Generation Computer Systems, 2023 - Elsevier
Deep reinforcement learning applied to computing systems has shown potential for
improving system performance, as well as faster discovery of better allocation strategies. In …

[HTML][HTML] AMPRO-HPCC: A machine-learning tool for predicting resources on slurm HPC clusters

M Tanash, D Andresen, W Hsu - ADVCOMP... the... International …, 2021 - ncbi.nlm.nih.gov
Determining resource allocations (memory and time) for submitted jobs in High Performance
Computing (HPC) systems is a challenging process even for computer scientists. HPC users …

Survey of memory management techniques for hpc and cloud computing

A Pupykina, G Agosta - IEEE Access, 2019 - ieeexplore.ieee.org
The emergence of new classes of HPC applications and usage models, such as real-time
HPC and cloud HPC, coupled with the increasingly heterogeneous nature of HPC …

Sizey: Memory-efficient execution of scientific workflow tasks

J Bader, F Skalski, F Lehmann… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
As the amount of available data continues to grow in fields as diverse as bioinformatics,
physics, and remote sensing, the importance of scientific workflows in the design and im …