Data and thread placement in numa architectures: A statistical learning approach

Learning intermediate representations using graph neural networks for numa and prefetchers optimization

A TehraniJamsaz, M Popov, A Dutta… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

There is a large space of NUMA and hardware prefetcher configurations that can
significantly impact the performance of an application. Previous studies have demonstrated …

被引用次数：16 相关文章所有 8 个版本

[PDF] acm.org

Modeling and optimizing numa effects and prefetching with machine learning

I Sánchez Barrera, D Black-Schaffer, M Casas… - Proceedings of the 34th …, 2020 - dl.acm.org

Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …

被引用次数：35 相关文章所有 3 个版本

[PDF] google.com

Adapt burstable containers to variable CPU resources

H Huang, Y Zhao, J Rao, S Wu, H Jin… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

In the age of the cloud-native, container technology, referred as OS-level virtualization, is
increasingly adopted to deploy cloud applications. Compared with virtual machines …

被引用次数：10 相关文章所有 5 个版本

Using machine learning to optimize graph execution on numa machines

HMG de A. Rocha, J Schwarzrock… - Proceedings of the 59th …, 2022 - dl.acm.org

This paper proposes PredG, a Machine Learning framework to enhance the graph
processing performance by finding the ideal thread and data mapping on NUMA systems …

被引用次数：8 相关文章

[PDF] nsf.gov

Compoff: A compiler cost model using machine learning to predict the cost of openmp offloading

A Mishra, S Chheda, C Soto, AM Malik… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

The HPC industry is inexorably moving towards an era of extremely heterogeneous
architectures, with more devices configured on any given HPC platform and potentially more …

被引用次数：8 相关文章所有 6 个版本

[PDF] hal.science

Adaptive load balancing based on machine learning for iterative parallel applications

CRAV Oikawa, V Freitas, M Castro… - 2020 28th Euromicro …, 2020 - ieeexplore.ieee.org

The performance of irregular scientific applications can be easily affected by an uneven
distribution of work among the computing resources. In this context, Load Balancing (LB) …

被引用次数：11 相关文章所有 19 个版本

[PDF] arxiv.org

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

A TehraniJamsaz, A Mishra, A Dutta… - 2024 IEEE …, 2024 - ieeexplore.ieee.org

GPU-based HPC clusters are attracting more sci-entific application developers due to their
extensive parallelism and energy efficiency. In order to achieve portability among a variety of …

被引用次数：2 相关文章所有 3 个版本

[PDF] ieee.org

Divide&Content: A fair OS-level resource manager for contention balancing on NUMA multicores

C Bilbao, JC Saez… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Chip multicore processors (CMPs) constitute the cherry-picked architecture for high-
performance servers employed in supercomputers and cloud datacenters. In the last few …

被引用次数：1 相关文章所有 5 个版本

[PDF] acm.org

WASP: Workload-Aware Self-Replicating Page-Tables for NUMA Servers

H Qu, Z Yu - Proceedings of the 29th ACM International Conference …, 2024 - dl.acm.org

Recently, page-table self-replication (PTSR) has been proposed to reduce the page-table
caused NUMA effect for large-memory workloads on NUMA servers. However, PTSR may …

被引用次数：3 相关文章

[HTML] mdpi.com

[HTML][HTML] Performance Study of an MRI Motion-Compensated Reconstruction Program on Intel CPUs, AMD EPYC CPUs, and NVIDIA GPUs

MA Zeroual, K Isaieva, PA Vuissoz, F Odille - Applied Sciences, 2024 - mdpi.com

Motion-compensated image reconstruction enables new clinical applications of Magnetic
Resonance Imaging (MRI), but it relies on computationally intensive algorithms. This study …

高级搜索

QQ 群