Learning intermediate representations using graph neural networks for numa and prefetchers optimization

A TehraniJamsaz, M Popov, A Dutta… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
There is a large space of NUMA and hardware prefetcher configurations that can
significantly impact the performance of an application. Previous studies have demonstrated …

Modeling and optimizing numa effects and prefetching with machine learning

I Sánchez Barrera, D Black-Schaffer, M Casas… - Proceedings of the 34th …, 2020 - dl.acm.org
Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …

Adapt burstable containers to variable CPU resources

H Huang, Y Zhao, J Rao, S Wu, H Jin… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
In the age of the cloud-native, container technology, referred as OS-level virtualization, is
increasingly adopted to deploy cloud applications. Compared with virtual machines …

Using machine learning to optimize graph execution on numa machines

HMG de A. Rocha, J Schwarzrock… - Proceedings of the 59th …, 2022 - dl.acm.org
This paper proposes PredG, a Machine Learning framework to enhance the graph
processing performance by finding the ideal thread and data mapping on NUMA systems …

Compoff: A compiler cost model using machine learning to predict the cost of openmp offloading

A Mishra, S Chheda, C Soto, AM Malik… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
The HPC industry is inexorably moving towards an era of extremely heterogeneous
architectures, with more devices configured on any given HPC platform and potentially more …

Adaptive load balancing based on machine learning for iterative parallel applications

CRAV Oikawa, V Freitas, M Castro… - 2020 28th Euromicro …, 2020 - ieeexplore.ieee.org
The performance of irregular scientific applications can be easily affected by an uneven
distribution of work among the computing resources. In this context, Load Balancing (LB) …

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

A TehraniJamsaz, A Mishra, A Dutta… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
GPU-based HPC clusters are attracting more sci-entific application developers due to their
extensive parallelism and energy efficiency. In order to achieve portability among a variety of …

Divide&Content: A fair OS-level resource manager for contention balancing on NUMA multicores

C Bilbao, JC Saez… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Chip multicore processors (CMPs) constitute the cherry-picked architecture for high-
performance servers employed in supercomputers and cloud datacenters. In the last few …

WASP: Workload-Aware Self-Replicating Page-Tables for NUMA Servers

H Qu, Z Yu - Proceedings of the 29th ACM International Conference …, 2024 - dl.acm.org
Recently, page-table self-replication (PTSR) has been proposed to reduce the page-table
caused NUMA effect for large-memory workloads on NUMA servers. However, PTSR may …

[HTML][HTML] Performance Study of an MRI Motion-Compensated Reconstruction Program on Intel CPUs, AMD EPYC CPUs, and NVIDIA GPUs

MA Zeroual, K Isaieva, PA Vuissoz, F Odille - Applied Sciences, 2024 - mdpi.com
Motion-compensated image reconstruction enables new clinical applications of Magnetic
Resonance Imaging (MRI), but it relies on computationally intensive algorithms. This study …