Grains3D, a flexible DEM approach for particles of arbitrary convex shape-Part II: Parallel implementation and scalable performance

AD Rakotonirina, A Wachs - Powder technology, 2018 - Elsevier
In [1] we suggested an original Discrete Element Method that offers the capability to consider
non-spherical particles of arbitrary convex shape. We elaborated on the foundations of our …

Hybrid CPU–GPU execution support in the skeleton programming framework SkePU

T Öhberg, A Ernstsson, C Kessler - The Journal of Supercomputing, 2020 - Springer
In this paper, we present a hybrid execution backend for the skeleton programming
framework SkePU. The backend is capable of automatically dividing the workload and …

Controllers: an abstraction to ease the use of hardware accelerators

A Moreton–Fernandez… - … Journal of High …, 2018 - journals.sagepub.com
Nowadays the use of hardware accelerators, such as the graphics processing units or
XeonPhi coprocessors, is key in solving computationally costly problems that require high …

OpenH: A Novel Programming Model and API for Developing Portable Parallel Programs on Heterogeneous Hybrid Servers

S Farrelly, RR Manumachu, A Lastovetsky - IEEE Access, 2024 - ieeexplore.ieee.org
Heterogeneous nodes composed of a multicore CPU and accelerators are today's norm in
high-performance computing (HPC) platforms due to their superior performance and energy …

SkelCL: a high-level extension of OpenCL for multi-GPU systems

M Steuwer, S Gorlatch - The Journal of Supercomputing, 2014 - Springer
Application development for modern high-performance systems with graphics processing
units (GPUs) currently relies on low-level programming approaches like CUDA and …

Supporting multiple accelerators in high-level programming models

Y Yan, PH Lin, C Liao, BR de Supinski… - Proceedings of the Sixth …, 2015 - dl.acm.org
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs,
are becoming common in work-stations, servers and supercomputers for scientific and …

Execution of compound multi‐kernel OpenCL computations in multi‐CPU/multi‐GPU environments

F Soldado, F Alexandre… - … and Computation: Practice …, 2016 - Wiley Online Library
Current computational systems are heterogeneous by nature, featuring a combination of
CPUs and graphics processing units (GPUs). As the latter are becoming an established …

A composable array function interface for heterogeneous computing in java

JJ Fumero, M Steuwer, C Dubach - Proceedings of ACM SIGPLAN …, 2014 - dl.acm.org
Heterogeneous computing has now become mainstream with virtually every desktop
machines featuring accelerators such as Graphics Processing Units (GPUs). While …

A parallel pattern for iterative stencil+ reduce

M Aldinucci, M Danelutto, M Drocco, P Kilpatrick… - The Journal of …, 2018 - Springer
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the
implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of …

High-level programming of stencil computations on multi-GPU systems using the SkelCL library

M Steuwer, M Haidl, S Breuer… - Parallel Processing …, 2014 - World Scientific
The implementation of stencil computations on modern, massively parallel systems with
GPUs and other accelerators currently relies on manually-tuned coding using low-level …