A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment

J Zhao, K Yang, X Wei, Y Ding, L Hu… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …

Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model

H Stengel, J Treibig, G Hager, G Wellein - Proceedings of the 29th ACM …, 2015 - dl.acm.org
Stencil algorithms on regular lattices appear in many fields of computational science, and
much effort has been put into optimized implementations. Such activities are usually not …

OpenCL-based FPGA-platform for stencil computation and its optimization methodology

HM Waidyasooriya, Y Takei, S Tatsumi… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Stencil computation is widely used in scientific computations and many accelerators based
on multicore CPUs and GPUs have been proposed. Stencil computation has a small …

Suitability analysis of FPGAs for heterogeneous platforms in HPC

FA Escobar, X Chang… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
High performance computing (HPC) systems currently integrate several resources such as
multi-cores (CPUs), graphic processing units (GPUs) and reconfigurable logic devices, like …

On how to accelerate iterative stencil loops: a scalable streaming-based approach

R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org
In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …

Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

M Sourouri, SB Baden, X Cai - International Journal of Parallel …, 2017 - Springer
We present a new compiler framework for truly heterogeneous 3D stencil computation on
GPU clusters. Our framework consists of a simple directive-based programming model and a …

DCMI: A scalable strategy for accelerating iterative stencil loops on FPGAs

M Koraei, O Fatemi, M Jahre - ACM Transactions on Architecture and …, 2019 - dl.acm.org
Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive
applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit …

Toward an evolutionary task parallel integrated MPI+ X programming model

RF Barrett, DT Stark, CT Vaughan, RE Grant… - Proceedings of the …, 2015 - dl.acm.org
The Bulk Synchronous Parallel programming model is showing performance limitations at
high processor counts. We propose over-decomposition of the domain, operated on as …

Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs

P Di, D Ye, Y Su, Y Sui, J Xue - 2012 41st International …, 2012 - ieeexplore.ieee.org
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of
GPUs to obtain high performance. One state-of-the-art approach makes use of the …

[HTML][HTML] Performance prediction of finite-difference solvers for different computer architectures

M Louboutin, M Lange, FJ Herrmann, N Kukreja… - Computers & …, 2017 - Elsevier
The life-cycle of a partial differential equation (PDE) solver is often characterized by three
development phases: the development of a stable numerical discretization; development of …