S Seo, A Amer, P Balaji, C Bordage… - … on Parallel and …, 2017 - ieeexplore.ieee.org
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with …
FX Lin, X Liu - ACM SIGPLAN Notices, 2016 - dl.acm.org
To harness a heterogeneous memory hierarchy, it is advantageous to integrate application knowledge in guiding frequent memory move, ie, replicating or migrating virtual memory …
Almost all of today's microprocessors contain memory controllers and directly attach to memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is …
There is a large space of NUMA and hardware prefetcher configurations that can significantly impact the performance of an application. Previous studies have demonstrated …
M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip will not be practical due to slowing growth in transistor density, low chip yields, and …
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HPC performance. Optimizing both together leads to a large and complex design …
Current multi-socket systems have complex memory hierarchies with significant Non- Uniform Memory Access (NUMA) effects: memory performance depends on the location of …
Non Uniform Memory Access (NUMA) architectures are nowadays common for running High- Performance Computing (HPC) applications. In such architectures, several distinct physical …
Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are …