Introducing the semi-stencil algorithm

T Henretty, K Stock, LN Pouchet, F Franchetti… - … CC 2011, Held as Part of …, 2011 - Springer

Stencil computations are at the core of applications in many domains such as computational
electromagnetics, image processing, and partial differential equation solvers used in a …

被引用次数：174 相关文章所有 22 个版本

[PDF] hal.science

Register optimizations for stencils on GPUs

PS Rawat, F Rastello, A Sukumaran-Rajam… - Proceedings of the 23rd …, 2018 - dl.acm.org

The recent advent of compute-intensive GPU architecture has allowed application
developers to explore high-order 3D stencils for better computational accuracy. A common …

被引用次数：67 相关文章所有 5 个版本

[PDF] colostate.edu

A framework for enhancing data reuse via associative reordering

K Stock, M Kong, T Grosser, LN Pouchet… - Proceedings of the 35th …, 2014 - dl.acm.org

The freedom to reorder computations involving associative operators has been widely
recognized and exploited in designing parallel algorithms and to a more limited extent in …

被引用次数：88 相关文章所有 10 个版本

[PDF] arxiv.org

SPARTA: spatial acceleration for efficient and scalable horizontal diffusion weather stencil computation

G Singh, A Khodamoradi, K Denolf, J Lo… - Proceedings of the 37th …, 2023 - dl.acm.org

Fast and accurate climate simulations and weather predictions are critical for understanding
and preparing for the impact of climate change. Real-world climate and weather simulations …

被引用次数：11 相关文章所有 5 个版本

[PDF] researchgate.net

Assessing accelerator-based HPC reverse time migration

M Araya-Polo, J Cabezas, M Hanzich… - … on Parallel and …, 2010 - ieeexplore.ieee.org

Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic
imaging technique, with crucial decisions on drilling investments. The economic value of the …

被引用次数：96 相关文章所有 8 个版本

Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing

A Lastovetsky, L Szustak… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Load balancing is a widely accepted technique for performance optimization of scientific
applications on parallel architectures. Indeed, balanced applications do not waste processor …

被引用次数：53 相关文章所有 6 个版本

[PDF] acm.org

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs

T Zhao, P Basu, S Williams, M Hall… - Proceedings of the …, 2019 - dl.acm.org

Stencil computations in real-world scientific applications may contain multiple interrelated
stencils, have multiple input grids, and use higher order discretizations with high arithmetic …

被引用次数：41 相关文章所有 2 个版本

Bricks: A high-performance portability layer for computations on block-structured grids

M Lakshminarasimhan, O Antepara… - … Journal of High …, 2024 - journals.sagepub.com

From partial differential equations to the convolutional neural networks in deep learning, to
matrix operations in dense linear algebra, computations on structured grids dominate high …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Accelerating high-order stencils on GPUs

R Sai, J Mellor-Crummey, X Meng… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org

While implementation strategies for low-order stencils on GPUs have been well-studied in
the literature, not all of the techniques work well for high-order stencils, such as those used …

被引用次数：22 相关文章所有 10 个版本

[PDF] escholarship.org

Compiler-directed transformation for higher-order stencils

P Basu, M Hall, S Williams… - 2015 IEEE …, 2015 - ieeexplore.ieee.org

As the cost of data movement increasingly dominates performance, developers of finite-
volume and finite-difference solutions for partial differential equations (PDEs) are exploring …

被引用次数：49 相关文章所有 8 个版本

高级搜索

QQ 群