Understanding stencil code performance on multicore architectures

A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment

J Zhao, K Yang, X Wei, Y Ding, L Hu… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org

Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …

被引用次数：215 相关文章所有 6 个版本

[PDF] arxiv.org

Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model

H Stengel, J Treibig, G Hager, G Wellein - Proceedings of the 29th ACM …, 2015 - dl.acm.org

Stencil algorithms on regular lattices appear in many fields of computational science, and
much effort has been put into optimized implementations. Such activities are usually not …

被引用次数：144 相关文章所有 7 个版本

OpenCL-based FPGA-platform for stencil computation and its optimization methodology

HM Waidyasooriya, Y Takei, S Tatsumi… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Stencil computation is widely used in scientific computations and many accelerators based
on multicore CPUs and GPUs have been proposed. Stencil computation has a small …

被引用次数：98 相关文章所有 5 个版本

Suitability analysis of FPGAs for heterogeneous platforms in HPC

FA Escobar, X Chang… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org

High performance computing (HPC) systems currently integrate several resources such as
multi-cores (CPUs), graphic processing units (GPUs) and reconfigurable logic devices, like …

被引用次数：64 相关文章所有 3 个版本

[PDF] acm.org

On how to accelerate iterative stencil loops: a scalable streaming-based approach

R Cattaneo, G Natale, C Sicignano, D Sciuto… - ACM Transactions on …, 2015 - dl.acm.org

In high-performance systems, stencil computations play a crucial role as they appear in a
variety of different fields of application, ranging from partial differential equation solving, to …

被引用次数：49 相关文章所有 5 个版本

[PDF] escholarship.org

Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

M Sourouri, SB Baden, X Cai - International Journal of Parallel …, 2017 - Springer

We present a new compiler framework for truly heterogeneous 3D stencil computation on
GPU clusters. Our framework consists of a simple directive-based programming model and a …

被引用次数：31 相关文章所有 9 个版本

[PDF] acm.org Full View

DCMI: A scalable strategy for accelerating iterative stencil loops on FPGAs

M Koraei, O Fatemi, M Jahre - ACM Transactions on Architecture and …, 2019 - dl.acm.org

Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive
applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit …

被引用次数：20 相关文章所有 4 个版本

[PDF] osti.gov

Toward an evolutionary task parallel integrated MPI+ X programming model

RF Barrett, DT Stark, CT Vaughan, RE Grant… - Proceedings of the …, 2015 - dl.acm.org

The Bulk Synchronous Parallel programming model is showing performance limitations at
high processor counts. We propose over-decomposition of the domain, operated on as …

被引用次数：40 相关文章所有 4 个版本

[PDF] academia.edu

Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs

P Di, D Ye, Y Su, Y Sui, J Xue - 2012 41st International …, 2012 - ieeexplore.ieee.org

Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of
GPUs to obtain high performance. One state-of-the-art approach makes use of the …

被引用次数：37 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] Performance prediction of finite-difference solvers for different computer architectures

M Louboutin, M Lange, FJ Herrmann, N Kukreja… - Computers & …, 2017 - Elsevier

The life-cycle of a partial differential equation (PDE) solver is often characterized by three
development phases: the development of a stable numerical discretization; development of …

被引用次数：25 相关文章所有 8 个版本

高级搜索

QQ 群