The design and implementation of a verification technique for GPU kernels

S Pai, K Pingali - Proceedings of the 2016 ACM SIGPLAN International …, 2016 - dl.acm.org

Writing high-performance GPU implementations of graph algorithms can be challenging. In
this paper, we argue that three optimizations called throughput optimizations are key to high …

被引用次数：121 相关文章所有 9 个版本

[PDF] arxiv.org

Overhauling SC atomics in C11 and OpenCL

M Batty, AF Donaldson, J Wickerson - … of the 43rd Annual ACM SIGPLAN …, 2016 - dl.acm.org

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic
operations and fences in the C11 and OpenCL memory models is subtle, leading to …

被引用次数：100 相关文章所有 15 个版本

[PDF] acm.org

CURD: a dynamic CUDA race detector

Y Peng, V Grover, J Devietti - ACM SIGPLAN Notices, 2018 - dl.acm.org

As GPUs have become an integral part of nearly every pro-cessor, GPU programming has
become increasingly popular. GPU programming requires a combination of extreme levels …

被引用次数：37 相关文章所有 4 个版本

[PDF] ucsc.edu

Exposing errors related to weak memory in GPU applications

T Sorensen, AF Donaldson - ACM SIGPLAN Notices, 2016 - dl.acm.org

We present the systematic design of a testing environment that uses stressing and fuzzing to
reveal errors in GPU applications that arise due to weak memory effects. We evaluate our …

被引用次数：45 相关文章所有 9 个版本

[PDF] psu.edu

Fast and precise symbolic analysis of concurrency bugs in device drivers (t)

P Deligiannis, AF Donaldson… - 2015 30th IEEE/ACM …, 2015 - ieeexplore.ieee.org

Concurrency errors, such as data races, make device drivers notoriously hard to develop
and debug without automated tool support. We present Whoop, a new automated approach …

被引用次数：46 相关文章所有 10 个版本

[PDF] kent.ac.uk

Portable inter-workgroup barrier synchronisation for GPUs

T Sorensen, AF Donaldson, M Batty… - Proceedings of the …, 2016 - dl.acm.org

Despite the growing popularity of GPGPU programming, there is not yet a portable and
formally-specified barrier that one can use to synchronise across workgroups. Moreover, the …

被引用次数：41 相关文章所有 10 个版本

[PDF] acm.org

Barracuda: Binary-level analysis of runtime races in cuda programs

A Eizenberg, Y Peng, T Pigli, W Mansky… - Proceedings of the 38th …, 2017 - dl.acm.org

GPU programming models enable and encourage massively parallel programming with
over a million threads, requiring extreme parallelism to achieve good performance. Massive …

被引用次数：36 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Symbolic identification of shared memory based bank conflicts for GPUs

A Horga, A Rezine, S Chattopadhyay, P Eles… - Journal of Systems …, 2022 - Elsevier

Graphic processing units (GPUs) are routinely used for general purpose computations to
improve performance. To achieve the sought performance gains, care must be invested in …

被引用次数：7 相关文章所有 4 个版本

[PDF] nsf.gov

Memory access protocols: certified data-race freedom for GPU kernels

T Cogumbreiro, J Lange, D Liew, H Zicarelli - Formal Methods in System …, 2023 - Springer

GPUs offer parallelism as a commodity, but they are difficult to program correctly. Static
analyzers that guarantee data-race freedom (DRF) are essential to help programmers …

被引用次数：4 相关文章所有 4 个版本

[PDF] iisc.ac.in

IGUARD: In-GPU advanced race detection

AK Kamath, A Basu - Proceedings of the ACM SIGOPS 28th Symposium …, 2021 - dl.acm.org

Newer use cases of GPU (Graphics Processing Unit) computing, eg, graph analytics, look
less like traditional bulk-synchronous GPU programs. To cater to the needs of emerging …

被引用次数：7 相关文章所有 5 个版本

高级搜索

QQ 群