Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Applying the roofline model

G Ofenbeck, R Steinmann, V Caparros… - … Analysis of Systems …, 2014 - ieeexplore.ieee.org
The recently introduced roofline model plots the performance of executed code against its
operational intensity (operations count divided by memory traffic). It also includes two …

Bi-objective optimization of data-parallel applications on heterogeneous HPC platforms for performance and energy through workload distribution

H Khaleghzadeh, M Fahad, A Shahid… - … on Parallel and …, 2020 - ieeexplore.ieee.org
Performance and energy are the two most important objectives for optimization on modern
parallel platforms. In this article, we show that moving from single-objective optimization for …

Performance Modeling for FPGAs: Extending the Roofline Model with High‐Level Synthesis Tools

B Da Silva, A Braeken, EH D'Hollander… - International Journal …, 2013 - Wiley Online Library
The potential of FPGAs as accelerators for high‐performance computing applications is very
large, but many factors are involved in their performance. The design for FPGAs and the …

Sea-land segmentation using deep learning techniques for landsat-8 OLI imagery

T Yang, S Jiang, Z Hong, Y Zhang, Y Han, R Zhou… - Marine …, 2020 - Taylor & Francis
Automated coastline extraction from optical satellites is fundamental to coastal mapping, and
sea-land segmentation is the core technology of coastline extraction. Deep convolutional …

Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling

A Lopes, F Pratas, L Sousa, A Ilic - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Optimization, portability and development of GPGPU applications are not trivial tasks, since
the capabilities and organization of GPU processing elements and memory subsystem …

Optimization of parallel iterated local search algorithms on graphics processing unit

Y Zhou, F He, Y Qiu - The Journal of Supercomputing, 2016 - Springer
Local search metaheuristics (LSMs) are efficient methods for solving hard optimization
problems in science, engineering, economics and technology. By using LSMs, we could …

A multi-GPU accelerated parallel domain decomposition one-step leapfrog ADI-FDTD

S Liu, B Zou, L Zhang, S Ren - IEEE Antennas and Wireless …, 2020 - ieeexplore.ieee.org
In this letter, a multi-GPU accelerated one-step leapfrog alternative-direction-implicit finite-
difference time-domain (ADI-FDTD) based on parallel SPIKE tridiagonal systems solver is …

Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format

S Shi, Q Wang, X Chu - 2020 IEEE 26th International …, 2020 - ieeexplore.ieee.org
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like
scientific computing and machine learning. However, existing work under-looks the …

[HTML][HTML] FPGA design space exploration for scientific HPC applications using a fast and accurate cost model based on roofline analysis

SW Nabi, W Vanderbauwhede - Journal of Parallel and Distributed …, 2019 - Elsevier
High-performance computing on heterogeneous platforms in general and those with FPGAs
in particular presents a significant programming challenge. We contend that compiler …