Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing

A Lastovetsky, L Szustak… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Load balancing is a widely accepted technique for performance optimization of scientific
applications on parallel architectures. Indeed, balanced applications do not waste processor …

Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2

W Xue, C Yang, H Fu, X Wang, Y Xu… - IEEE Transactions …, 2014 - ieeexplore.ieee.org
In this work an ultra-scalable algorithm is designed and optimized to accelerate a 3D
compressible Euler atmospheric model on the CPU-MIC hybrid system of Tianhe-2. We first …

Stencil codes on a vector length agnostic architecture

A Armejach, H Caminal, JM Cebrian… - Proceedings of the 27th …, 2018 - dl.acm.org
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD
capabilities, it can provide substantial performance improvements on top of widely used …

Using Arm's scalable vector extension on stencil codes

A Armejach, H Caminal, JM Cebrian… - The Journal of …, 2020 - Springer
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD
capabilities, it can provide substantial performance improvements on top of widely used …

Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor

L Szustak, K Rojek, T Olas, L Kuczynski… - Scientific …, 2015 - Wiley Online Library
The multidimensional positive definite advection transport algorithm (MPDATA) belongs to
the group of nonoscillatory forward‐in‐time algorithms and performs a sequence of stencil …

Porting and optimization of solidification application for CPU–MIC hybrid platforms

L Szustak, K Halbiniak, L Kuczynski… - … Journal of High …, 2018 - journals.sagepub.com
Modern heterogeneous computing platforms have become powerful HPC solutions, which
could be applied to a wide range of real-life applications. In particular, the hybrid platforms …

Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

L Szustak, P Bratek - The International Journal of High …, 2019 - journals.sagepub.com
In this work, we take up the challenge of performance portable programming of
heterogeneous stencil computations across a wide range of modern shared-memory …

[HTML][HTML] Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

L Szustak, K Halbiniak, R Wyrzykowski… - The Journal of …, 2019 - Springer
This paper meets the challenge of harnessing the heterogeneous communication
architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an …

Islands-of-cores approach for harnessing SMP/NUMA architectures in heterogeneous stencil computations

L Szustak, R Wyrzykowski, O Jakl - … , September 4-8, 2017, Proceedings 14, 2017 - Springer
SMP/NUMA systems are powerful HPC platforms which could be applied for a wide range of
real-life applications. These systems provide large capacity of shared memory, and allow …

[PDF][PDF] Exploring OpenMP Accelerator Model in a real-life scientific application using hybrid CPU-MIC platforms

K Halbiniak, L Szustak, A Lastovetsky… - Proceedings 3rd …, 2016 - e-archivo.uc3m.es
The main goal of this paper is the suitability assessment of the OpenMP Accelerator Model
(OMPAM) for porting a real-life scientific application to heterogeneous platforms containing a …