Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture

S Usman, R Mehmood, I Katib, A Albeshri - Electronics, 2022 - mdpi.com
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …

Advanced synchronization techniques for task-based runtime systems

D Álvarez, K Sala, M Maroñas, A Roca… - Proceedings of the 26th …, 2021 - dl.acm.org
Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow
execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient …

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

DuctTeip: An efficient programming model for distributed task-based parallel computing

A Zafari, E Larsson, M Tillenius - Parallel Computing, 2019 - Elsevier
Current high-performance computer systems used for scientific computing typically combine
shared memory computational nodes in a distributed memory environment. Extracting high …

Optimizing iterative data-flow scientific applications using directed cyclic graphs

D Álvarez, V Beltran - IEEE access, 2023 - ieeexplore.ieee.org
Data-flow programming models have become a popular choice for writing parallel
applications as an alternative to traditional work-sharing parallelism. They are better suited …

PufferFish: NUMA-aware work-stealing library using elastic tasks

V Kumar - 2020 IEEE 27th International Conference on High …, 2020 - ieeexplore.ieee.org
Due to the challenges in providing adequate memory access to many cores on a single
processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream …

Scalable tasking runtime with parallelized builders for explicit message passing architectures

L Chen, X Gao, H Wang, H Cui, X Feng - Parallel Computing, 2024 - Elsevier
The sequential task flow (STF) model introduces implicit data dependences to exploit task-
based parallelism, simplifying programming but also introducing non-negligible runtime …

Blaze-Tasks: A framework for computing parallel reductions over tasks

P Pirkelbauer, A Wilson, C Peterson… - ACM Transactions on …, 2019 - dl.acm.org
Compared to threads, tasks are a more fine-grained alternative. The task parallel
programming model offers benefits in terms of better performance portability and better load …

A dataflow IR for memory efficient RIPL compilation to FPGAs

R Stewart, G Michaelson, D Bhowmik, P Garcia… - … and Architectures for …, 2016 - Springer
Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors
architectures because their memory hierarchies can be tailored to the needs of an algorithm …

Approaches for task affinity in OpenMP

C Terboven, J Hahnfeld, X Teruel, S Mateo… - … : Memory, Devices, and …, 2016 - Springer
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP
specifications extended tasking to increase functionality and to support optimizations, for …