Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

A city-scale estimation of rooftop solar photovoltaic potential based on deep learning

T Zhong, Z Zhang, M Chen, K Zhang, Z Zhou, R Zhu… - Applied Energy, 2021 - Elsevier
The estimation of rooftop solar photovoltaic (PV) potential is crucial for policymaking around
sustainable energy plans. But it is difficult to accurately estimate the availability of rooftop …

CLTune: A generic auto-tuner for OpenCL kernels

C Nugteren, V Codreanu - 2015 IEEE 9th International …, 2015 - ieeexplore.ieee.org
This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluates and tunes kernel
performance of a generic, user-defined search space of possible parameter-value …

Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus

M Wang, S Ding, T Cao, Y Liu, F Xu - Proceedings of the 27th Annual …, 2021 - dl.acm.org
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …

[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier
A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …

The interface between forensic science and technology: how technology could cause a paradigm shift in the role of forensic institutes in the criminal justice system

A Kloosterman, A Mapes, Z Geradts… - … of the Royal …, 2015 - royalsocietypublishing.org
In this paper, the importance of modern technology in forensic investigations is discussed.
Recent technological developments are creating new possibilities to perform robust …

Accelerating sparse cnn inference on gpus with performance-aware weight pruning

MA Rumi, X Ma, Y Wang, P Jiang - Proceedings of the ACM International …, 2020 - dl.acm.org
Weight pruning is a popular technique to reduce the size and computation complexity of the
Convolutional Neural Networks (CNNs). Despite its success in reducing the model size …

Optimization of parallel iterated local search algorithms on graphics processing unit

Y Zhou, F He, Y Qiu - The Journal of Supercomputing, 2016 - Springer
Local search metaheuristics (LSMs) are efficient methods for solving hard optimization
problems in science, engineering, economics and technology. By using LSMs, we could …

Bayesian Optimization for auto-tuning GPU kernels

FJ Willemsen, R van Nieuwpoort… - … and Simulation of …, 2021 - ieeexplore.ieee.org
Finding optimal parameter configurations for tunable GPU kernels is a non-trivial exercise
for large search spaces, even when automated. This poses an optimization task on a …