Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture

S Usman, R Mehmood, I Katib, A Albeshri - Electronics, 2022 - mdpi.com
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …

Machine learning in compiler optimization

Z Wang, M O'Boyle - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
In the last decade, machine-learning-based compilation has moved from an obscure
research niche to a mainstream activity. In this paper, we describe the relationship between …

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

S Memeti, S Pllana, A Binotto, J Kołodziej, I Brandic - Computing, 2019 - Springer
While modern parallel computing systems offer high performance, utilizing these powerful
computing resources to the highest possible extent demands advanced knowledge of …

Learning intermediate representations using graph neural networks for numa and prefetchers optimization

A TehraniJamsaz, M Popov, A Dutta… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
There is a large space of NUMA and hardware prefetcher configurations that can
significantly impact the performance of an application. Previous studies have demonstrated …

Modeling and optimizing numa effects and prefetching with machine learning

I Sánchez Barrera, D Black-Schaffer, M Casas… - Proceedings of the 34th …, 2020 - dl.acm.org
Both NUMA thread/data placement and hardware prefetcher configuration have significant
impacts on HPC performance. Optimizing both together leads to a large and complex design …

Compiler support for selective page migration in NUMA architectures

G Piccoli, HN Santos, RE Rodrigues, C Pousa… - Proceedings of the 23rd …, 2014 - dl.acm.org
Current high-performance multicore processors provide users with a non-uniform memory
access model (NUMA). These systems perform better when threads access data on memory …

Machine learning-based self-adjusting concurrency in software transactional memory systems

D Rughetti, P Di Sanzo, B Ciciani… - 2012 IEEE 20th …, 2012 - ieeexplore.ieee.org
One of the problems of Software-Transactional-Memory (STM) systems is the performance
degradation that can be experienced when applications run with a non-optimal concurrency …

Data and thread placement in numa architectures: A statistical learning approach

N Denoyelle, B Goglin, E Jeannot… - Proceedings of the 48th …, 2019 - dl.acm.org
Nowadays, NUMA architectures are common in compute-intensive systems. Achieving high
performance for multi-threaded application requires both a careful placement of threads on …

ZAKI+: A machine learning based process mapping tool for SpMV computations on distributed memory architectures

S Usman, R Mehmood, I Katib, A Albeshri - IEEE Access, 2019 - ieeexplore.ieee.org
Smart cities and other cyber-physical systems (CPSs) rely on various scientific, engineering,
business, and social applications that provide timely intelligence for their design, operations …

Mirencoder: Multi-modal ir-based pretrained embeddings for performance optimizations

A Dutta, A Jannesari - Proceedings of the 2024 International Conference …, 2024 - dl.acm.org
One of the primary areas of interest in High Performance Computing is the improvement of
performance of parallel workloads. Nowadays, compilable source code-based optimization …