Comparing runtime systems with exascale ambitions using the parallel research kernels

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org

The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

被引用次数：64 相关文章所有 4 个版本

[PDF] arxiv.org

Task bench: A parameterized benchmark for evaluating parallel runtime performance

E Slaughter, W Wu, Y Fu, L Brandenburg… - … Conference for High …, 2020 - ieeexplore.ieee.org

We present Task Bench, a parameterized benchmark designed to explore the performance
of distributed programming systems under a variety of application scenarios. Task Bench …

被引用次数：58 相关文章所有 14 个版本

[PDF] arxiv.org

Quantifying Overheads in Charm`++` and HPX Using Task Bench

N Wu, I Gonidelis, S Liu, Z Fink, N Gupta… - … Conference on Parallel …, 2022 - Springer

Abstract Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core
architectures with light-weight threads, asynchronous executions, and smart scheduling. In …

被引用次数：6 相关文章所有 7 个版本

[PDF] arxiv.org

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm`++`, C`++`, HPX, Go, Julia, Python, Rust, Swift, and Java

P Diehl, M Morris, SR Brandt, N Gupta… - European Conference on …, 2023 - Springer

Many scientific high performance codes that simulate eg black holes, coastal waves, climate
and weather, etc. rely on block-structured meshes and use finite differencing methods to …

被引用次数：4 相关文章所有 4 个版本

[PDF] stanford.edu

Control Replication: Compiling implicit parallelism to efficient SPMD with logical regions

E Slaughter, W Lee, S Treichler, W Zhang… - Proceedings of the …, 2017 - dl.acm.org

We present control replication, a technique for generating high-performance and scalable
SPMD code from implicitly parallel programs. In contrast to traditional parallel programming …

被引用次数：20 相关文章所有 10 个版本

[PDF] bris.ac.uk

Benchmarking fortran DO CONCURRENT on cpus and gpus using babelstream

JR Hammond, T Deakin, J Cownie… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org

Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of
loops on CPUs and GPUs. This paper studies the performance portability of this construct on …

被引用次数：6 相关文章所有 6 个版本

[PDF] acm.org Full View

LAPPS: Locality-aware productive prefetching support for PGAS

E Kayraklioglu, MP Ferguson… - ACM Transactions on …, 2018 - dl.acm.org

Prefetching is a well-known technique to mitigate scalability challenges in the Partitioned
Global Address Space (PGAS) model. It has been studied as either an automated compiler …

被引用次数：15 相关文章所有 3 个版本

[PDF] illinois.edu

FlipBack: automatic targeted protection against silent data corruption

X Ni, LV Kale - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org

The decreasing size of transistors has been critical to the increase in capacity of
supercomputers. The smaller the transistors are, less energy is required to flip a bit, and thus …

被引用次数：20 相关文章所有 7 个版本

CAMP: a Synthetic Micro-Benchmark for Assessing Deep Memory Hierarchies

W Peng, E Belikov - 2022 IEEE/ACM International Workshop on …, 2022 - ieeexplore.ieee.org

This paper presents CAMP, a Configurable App for Memory Probing that facilitates
assessment of intra-node deep memory hierarchies through performance measurements of …

被引用次数：2 相关文章所有 3 个版本

[PDF] gwu.edu

Comparative performance and optimization of chapel in modern manycore architectures

E Kayraklioglu, W Chang… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Chapel is an emerging scalable, productive parallel programming language. In this work, we
analyze Chapel's performance using The Parallel Research Kernels on two different …

被引用次数：12 相关文章所有 6 个版本

高级搜索

QQ 群