Jobstats: A Slurm-Compatible Job Monitoring Platform for CPU and GPU Clusters

J Plazonic, J Halverson, T Comi - Practice and Experience in Advanced …, 2023 - dl.acm.org
Job monitoring on high-performance computing clusters is important for evaluating
hardware performance, troubleshooting failed jobs, identifying inefficient jobs and more. The …

Applying process mining on scientific workflows: a case study

Z Sadeghibogar, A Berti, M Pegoraro… - arXiv preprint arXiv …, 2023 - arxiv.org
Computer-based scientific experiments are becoming increasingly data-intensive. High-
Performance Computing (HPC) clusters are ideal for executing large scientific experiment …

[PDF][PDF] SLURMminer: A Tool for SLURM System Analysis with Process Mining.

Z Sadeghibogar, A Berti, M Pegoraro… - BPM (Demos …, 2023 - ceur-ws.org
SLURMminer is a tool designed to analyze SLURM systems in High-Performance
Computing (HPC) clusters. It utilizes process mining techniques to generate event logs …

An Integrated Job Monitor, Analyzer and Predictor

A Pal, P Malakar - 2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org
High performance computing systems are used for compute-intensive jobs by multiple users.
The users submit jobs to batch queues where the jobs are queued for an unknown amount …

[PDF][PDF] HPC Job-Monitoring with SLURM, Prometheus and Grafana

P Kunz - 2022 - hpc.dmi.unibas.ch
An important step in increasing the observability and efficient usage of computational power
on High Performance Computers is to capture data about running jobs, consequently storing …

[PDF][PDF] Real-Time GPU Usage Alert Service on Pre-Exascale HPC Clusters

S Jiang - 2024 - openreview.net
Improving observability in large-scale distributed computing clusters has always been a
complex problem, particularly in High-Performance Computing (HPC). Despite the growing …

The Jobstats Job Monitoring Platform for CPU and GPU Clusters

J Plazonic, J Halverson - Practice and Experience in Advanced …, 2024 - dl.acm.org
In 2023, we introduced the Jobstats job monitoring platform [9] which provides user-facing
commands and interfaces for inspecting the efficiency of Slurm jobs on CPU and GPU …