Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Graph IRS for impure higher-order languages: making aggressive optimizations affordable with precise effect dependencies

O Bračevac, G Wei, S Jia, S Abeysinghe… - Proceedings of the …, 2023 - dl.acm.org
Graph-based intermediate representations (IRs) are widely used for powerful compiler
optimizations, either interprocedurally in pure functional languages, or intraprocedurally in …

Scalable kernel fusion for memory-bound GPU applications

M Wahib, N Maruyama - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
GPU implementations of HPC applications relying on finite difference methods can include
tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing …

TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations

RP Pelaez, G Simeon, R Galvelis… - Journal of Chemical …, 2024 - ACS Publications
Achieving a balance between computational speed, prediction accuracy, and universal
applicability in molecular simulations has been a persistent challenge. This paper presents …

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit

F Petrovič, D Střelák, J Hozzová, J Ol'ha… - Future Generation …, 2020 - Elsevier
In recent years, the heterogeneity of both commodity and supercomputers hardware has
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …

Automatic horizontal fusion for GPU kernels

A Li, B Zheng, G Pekhimenko… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
We present automatic horizontal fusion, a novel optimization technique that complements
the standard kernel fusion techniques for GPU programs. Unlike the standard fusion, whose …

When ML Training Cuts Through Congestion: Just-in-Time Gradient Compression via Packet Trimming

X Chen, S Vargaftik, RB Basat - Proceedings of the 23rd ACM Workshop …, 2024 - dl.acm.org
Distributed training of ML models generates significant network traffic when exchanging
gradients and is sensitive to packet drops and retransmission caused by congestion when …

A performance analysis of parallel differential dynamic programming on a gpu

B Plancher, S Kuindersma - … Foundations of Robotics XIII: Proceedings of …, 2020 - Springer
Parallelism can be used to significantly increase the throughput of computationally
expensive algorithms. With the widespread adoption of parallel computing platforms such as …

GPU parallelization strategies for metaheuristics: a survey

M Essaid, L Idoumghar, J Lepagnot… - International Journal of …, 2019 - Taylor & Francis
Metaheuristics have been showing interesting results in solving hard optimization problems.
However, they become limited in terms of effectiveness and runtime for high dimensional …