The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Task bench: A parameterized benchmark for evaluating parallel runtime performance

E Slaughter, W Wu, Y Fu, L Brandenburg… - … Conference for High …, 2020 - ieeexplore.ieee.org
We present Task Bench, a parameterized benchmark designed to explore the performance
of distributed programming systems under a variety of application scenarios. Task Bench …

Quantifying Overheads in Charm++ and HPX Using Task Bench

N Wu, I Gonidelis, S Liu, Z Fink, N Gupta… - … Conference on Parallel …, 2022 - Springer
Abstract Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core
architectures with light-weight threads, asynchronous executions, and smart scheduling. In …

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

P Diehl, M Morris, SR Brandt, N Gupta… - European Conference on …, 2023 - Springer
Many scientific high performance codes that simulate eg black holes, coastal waves, climate
and weather, etc. rely on block-structured meshes and use finite differencing methods to …

Control Replication: Compiling implicit parallelism to efficient SPMD with logical regions

E Slaughter, W Lee, S Treichler, W Zhang… - Proceedings of the …, 2017 - dl.acm.org
We present control replication, a technique for generating high-performance and scalable
SPMD code from implicitly parallel programs. In contrast to traditional parallel programming …

Benchmarking fortran DO CONCURRENT on cpus and gpus using babelstream

JR Hammond, T Deakin, J Cownie… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of
loops on CPUs and GPUs. This paper studies the performance portability of this construct on …

LAPPS: Locality-aware productive prefetching support for PGAS

E Kayraklioglu, MP Ferguson… - ACM Transactions on …, 2018 - dl.acm.org
Prefetching is a well-known technique to mitigate scalability challenges in the Partitioned
Global Address Space (PGAS) model. It has been studied as either an automated compiler …

FlipBack: automatic targeted protection against silent data corruption

X Ni, LV Kale - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
The decreasing size of transistors has been critical to the increase in capacity of
supercomputers. The smaller the transistors are, less energy is required to flip a bit, and thus …

CAMP: a Synthetic Micro-Benchmark for Assessing Deep Memory Hierarchies

W Peng, E Belikov - 2022 IEEE/ACM International Workshop on …, 2022 - ieeexplore.ieee.org
This paper presents CAMP, a Configurable App for Memory Probing that facilitates
assessment of intra-node deep memory hierarchies through performance measurements of …

Comparative performance and optimization of chapel in modern manycore architectures

E Kayraklioglu, W Chang… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Chapel is an emerging scalable, productive parallel programming language. In this work, we
analyze Chapel's performance using The Parallel Research Kernels on two different …