The recent advent of compute-intensive GPU architecture has allowed application developers to explore high-order 3D stencils for better computational accuracy. A common …
The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in …
Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world climate and weather simulations …
Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic imaging technique, with crucial decisions on drilling investments. The economic value of the …
Load balancing is a widely accepted technique for performance optimization of scientific applications on parallel architectures. Indeed, balanced applications do not waste processor …
Stencil computations in real-world scientific applications may contain multiple interrelated stencils, have multiple input grids, and use higher order discretizations with high arithmetic …
From partial differential equations to the convolutional neural networks in deep learning, to matrix operations in dense linear algebra, computations on structured grids dominate high …
While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of the techniques work well for high-order stencils, such as those used …
As the cost of data movement increasingly dominates performance, developers of finite- volume and finite-difference solutions for partial differential equations (PDEs) are exploring …