A compiler for throughput optimization of graph algorithms on GPUs

S Pai, K Pingali - Proceedings of the 2016 ACM SIGPLAN International …, 2016 - dl.acm.org
Writing high-performance GPU implementations of graph algorithms can be challenging. In
this paper, we argue that three optimizations called throughput optimizations are key to high …

Overhauling SC atomics in C11 and OpenCL

M Batty, AF Donaldson, J Wickerson - … of the 43rd Annual ACM SIGPLAN …, 2016 - dl.acm.org
Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic
operations and fences in the C11 and OpenCL memory models is subtle, leading to …

CURD: a dynamic CUDA race detector

Y Peng, V Grover, J Devietti - ACM SIGPLAN Notices, 2018 - dl.acm.org
As GPUs have become an integral part of nearly every pro-cessor, GPU programming has
become increasingly popular. GPU programming requires a combination of extreme levels …

Exposing errors related to weak memory in GPU applications

T Sorensen, AF Donaldson - ACM SIGPLAN Notices, 2016 - dl.acm.org
We present the systematic design of a testing environment that uses stressing and fuzzing to
reveal errors in GPU applications that arise due to weak memory effects. We evaluate our …

Fast and precise symbolic analysis of concurrency bugs in device drivers (t)

P Deligiannis, AF Donaldson… - 2015 30th IEEE/ACM …, 2015 - ieeexplore.ieee.org
Concurrency errors, such as data races, make device drivers notoriously hard to develop
and debug without automated tool support. We present Whoop, a new automated approach …

Portable inter-workgroup barrier synchronisation for GPUs

T Sorensen, AF Donaldson, M Batty… - Proceedings of the …, 2016 - dl.acm.org
Despite the growing popularity of GPGPU programming, there is not yet a portable and
formally-specified barrier that one can use to synchronise across workgroups. Moreover, the …

Barracuda: Binary-level analysis of runtime races in cuda programs

A Eizenberg, Y Peng, T Pigli, W Mansky… - Proceedings of the 38th …, 2017 - dl.acm.org
GPU programming models enable and encourage massively parallel programming with
over a million threads, requiring extreme parallelism to achieve good performance. Massive …

[HTML][HTML] Symbolic identification of shared memory based bank conflicts for GPUs

A Horga, A Rezine, S Chattopadhyay, P Eles… - Journal of Systems …, 2022 - Elsevier
Graphic processing units (GPUs) are routinely used for general purpose computations to
improve performance. To achieve the sought performance gains, care must be invested in …

Memory access protocols: certified data-race freedom for GPU kernels

T Cogumbreiro, J Lange, D Liew, H Zicarelli - Formal Methods in System …, 2023 - Springer
GPUs offer parallelism as a commodity, but they are difficult to program correctly. Static
analyzers that guarantee data-race freedom (DRF) are essential to help programmers …

IGUARD: In-GPU advanced race detection

AK Kamath, A Basu - Proceedings of the ACM SIGOPS 28th Symposium …, 2021 - dl.acm.org
Newer use cases of GPU (Graphics Processing Unit) computing, eg, graph analytics, look
less like traditional bulk-synchronous GPU programs. To cater to the needs of emerging …