H Stengel, J Treibig, G Hager, G Wellein - Proceedings of the 29th ACM …, 2015 - dl.acm.org
Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not …
HM Waidyasooriya, Y Takei, S Tatsumi… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Stencil computation is widely used in scientific computations and many accelerators based on multicore CPUs and GPUs have been proposed. Stencil computation has a small …
FA Escobar, X Chang… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
High performance computing (HPC) systems currently integrate several resources such as multi-cores (CPUs), graphic processing units (GPUs) and reconfigurable logic devices, like …
R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org
In high-performance systems, stencil computations play a crucial role as they appear in a variety of different fields of application, ranging from partial differential equation solving, to …
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a …
M Koraei, O Fatemi, M Jahre - ACM Transactions on Architecture and …, 2019 - dl.acm.org
Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit …
RF Barrett, DT Stark, CT Vaughan, RE Grant… - Proceedings of the …, 2015 - dl.acm.org
The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as …
P Di, D Ye, Y Su, Y Sui, J Xue - 2012 41st International …, 2012 - ieeexplore.ieee.org
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the …
The life-cycle of a partial differential equation (PDE) solver is often characterized by three development phases: the development of a stable numerical discretization; development of …