Stellar mergers with HPX-Kokkos and SYCL: methods of using an asynchronous many-task runtime system with SYCL

G Daiß, P Diehl, H Kaiser, D Pflüger - Proceedings of the 2023 …, 2023 - dl.acm.org
Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of
available accelerator cards within current supercomputers, portability is a key aspect for …

From task-based gpu work aggregation to stellar mergers: Turning fine-grained cpu tasks into portable gpu kernels

G Daiß, P Diehl, D Marcello… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Meeting both scalability and performance portability requirements is a challenge for any
HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics …

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

P Diehl, G Daiß, K Huck, D Marcello, S Shiber… - The Journal of …, 2024 - Springer
The increasing availability of machines relying on non-GPU architectures, such as ARM
A64FX in high-performance computing, provides a set of interesting challenges to …

Asynchronous-Many-Task Systems: Challenges and Opportunities--Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

G Daiß, P Diehl, J Yan, JK Holmen, R Gayatri… - arXiv preprint arXiv …, 2024 - arxiv.org
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-
model simulations, necessitating precise physics resolution in localized areas across …

From merging frameworks to merging stars: experiences using HPX, Kokkos and SIMD Types

G Daiß, SY Singanaboina, P Diehl… - 2022 IEEE/ACM 7th …, 2022 - ieeexplore.ieee.org
Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX,
Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range …

Making Uintah Performance Portable for Department of Energy Exascale Testbeds

JK Holmen, M García, A Bagusetty… - … Conference on Parallel …, 2023 - Springer
To help ease ports to forthcoming Department of Energy (DOE) exascale systems, testbeds
have been made available to select users. These testbeds are helpful for preparing codes to …

Distributed, combined CPU and GPU profiling within HPX using APEX

P Diehl, G Daiss, K Huck, D Marcello, S Shiber… - arXiv preprint arXiv …, 2022 - arxiv.org
Benchmarking and comparing performance of a scientific simulation across hardware
platforms is a complex task. When the simulation in question is constructed with an …

View-aware Message Passing Through the Integration of Kokkos and ExaMPI

E Suggs, S Olivier, J Ciesko, A Skjellum - Proceedings of the 30th …, 2023 - dl.acm.org
Kokkos provides in-memory advanced data structures, concurrency, and algorithms to
support performance portable C++ parallel programming across CPUs and GPUs. The …

Enhancing Asynchronous Many-Task Runtime Systems for Next-Generation Architectures and Exascale Supercomputers

D Sahasrabudhe - 2021 - search.proquest.com
Exascale supercomputers capable of computing 10 18 double-precision floating point
operations per second are expected to be operational around 2022/23. The complexity and …

Portable, scalable approaches for improving asynchronous many-task runtime node use

JK Holmen - 2022 - search.proquest.com
This research addresses node-level scalability, portability, and heterogeneous computing
challenges facing asynchronous many-task (AMT) runtime systems. These challenges have …