[HTML][HTML] A review on the decarbonization of high-performance computing centers

CA Silva, R Vilaça, A Pereira, RJ Bessa - Renewable and Sustainable …, 2024 - Elsevier
High-performance computing relies on performance-oriented infrastructures with access to
powerful computing resources to complete tasks that contribute to solve complex problems …

RUAD: Unsupervised anomaly detection in HPC systems

M Molan, A Borghesi, D Cesarini, L Benini… - Future Generation …, 2023 - Elsevier
The increasing complexity of modern high-performance computing (HPC) systems
necessitates the introduction of automated and data-driven methodologies to support system …

Visualizing an exascale data center digital twin: Considerations, challenges and opportunities

M Maiterth, W Brewer, D De Wet… - … and Visual Analytics …, 2024 - ieeexplore.ieee.org
Digital twins are an excellent tool to model, visualize, and simulate complex systems, to
understand and optimize their operation. In this work, we present the technical challenges of …

Prodigy: Towards unsupervised anomaly detection in production hpc systems

B Aksar, E Sencan, B Schwaller, O Aaziz… - Proceedings of the …, 2023 - dl.acm.org
Performance variations caused by anomalies in modern High Performance Computing
(HPC) systems lead to decreased efficiency, impaired application performance, and …

Graph neural networks for anomaly anticipation in HPC systems

M Molan, J Ahmed Khan, A Borghesi… - Companion of the 2023 …, 2023 - dl.acm.org
In this paper, we explore the use of Graph Neural Networks (GNNs) for anomaly anticipation
in high performance computing (HPC) systems. We propose a GNN-based approach that …

Tandem predictions for hpc jobs

K Menear, K Konate, K Potter, D Duplyakin - Practice and Experience in …, 2024 - dl.acm.org
At the core of the predictive analytics applied to High Performance Computing (HPC), the
most prominent tasks are the prediction of job runtimes and the prediction of job queue …

Monte Cimone: paving the road for the first generation of RISC-V high-performance computers

A Bartolini, F Ficarelli, E Parisi… - 2022 IEEE 35th …, 2022 - ieeexplore.ieee.org
The new open and royalty-free RISC-V ISA is attracting interest across the whole computing
continuum, from microcontrollers to supercomputers. High-performance RISC-V processors …

Analyzing HPC Monitoring Data With a View Towards Efficient Resource Utilization

S Maloney, E Suarez, N Eicker… - 2024 IEEE 36th …, 2024 - ieeexplore.ieee.org
Compute nodes in modern HPC systems are growing in size and their hardware has
become ever more diverse. Still, many HPC centers allocate the resources of full nodes …

Technical Readiness of Prescriptive Analytics Platforms: A Survey

M Niederhaus, N Migenda, J Weller… - … 35th Conference of …, 2024 - ieeexplore.ieee.org
Decision-making is the process of selecting a course of action from several alternatives on
the basis of preferences, values and available information. As decisions become …

Semi-supervised anomaly detection on a Tier-0 HPC system

M Molan, A Borghesi, L Benini, A Bartolini - Proceedings of the 19th ACM …, 2022 - dl.acm.org
Automated and data-driven methodologies are being introduced to assist system
administrators in managing increasingly complex modern HPC systems. Anomaly detection …