Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning

B Aksar, E Sencan, B Schwaller, O Aaziz… - … on Parallel and …, 2024 - ieeexplore.ieee.org
With the increasing scale and complexity of High-Performance Computing (HPC) systems,
performance variations in applications caused by anomalies have become significant …

Machine learning-based performance analytics for high-performance computing systems

B Aksar - 2024 - search.proquest.com
High-performance Computing (HPC) systems play pivotal roles in societal and scientific
advancements, executing up to quintillions (10 18) of calculations every second. As we shift …