GKLEE: concolic verification and test generation for GPUs

G Li, P Li, G Sawaya, G Gopalakrishnan… - Proceedings of the 17th …, 2012 - dl.acm.org
Programs written for GPUs often contain correctness errors such as races, deadlocks, or
may compute the wrong result. Existing debugging tools often miss these errors because of …

Divergence reduction in Monte Carlo neutron transport with on-GPU asynchronous scheduling

B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org
While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …

Automatic optimization of thread-coarsening for graphics processors

A Magni, C Dubach, M O'Boyle - … of the 23rd international conference on …, 2014 - dl.acm.org
OpenCL has been designed to achieve functional portability across multi-core devices from
different vendors. However, the lack of a single cross-target optimizing compiler severely …

A large-scale cross-architecture evaluation of thread-coarsening

A Magni, C Dubach, MFP O'Boyle - Proceedings of the International …, 2013 - dl.acm.org
OpenCL has become the de-facto data parallel programming model for parallel devices in
today's high-performance supercomputers. OpenCL was designed with the goal of …

A sparse probabilistic learning algorithm for real-time tracking

Blake, Cipolla - Proceedings Ninth IEEE International …, 2003 - ieeexplore.ieee.org
We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …

Convergence and scalarization for data-parallel architectures

Y Lee, R Krashinsky, V Grover… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Modern throughput processors such as GPUs achieve high performance and efficiency by
exploiting data parallelism in application kernels expressed as threaded code. One draw …

Partial control-flow linearization

S Moll, S Hack - ACM SIGPLAN Notices, 2018 - dl.acm.org
If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a
SIMD program, several targets of a branch might be executed because of divergence …

Effective function merging in the ssa form

RCO Rocha, P Petoumenos, Z Wang, M Cole… - Proceedings of the 41st …, 2020 - dl.acm.org
Function merging is an important optimization for reducing code size. This technique
eliminates redundant code across functions by merging them into a single function. While …

Function merging by sequence alignment

RCO Rocha, P Petoumenos, Z Wang… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
Resource-constrained devices for embedded systems are becoming increasingly important.
In such systems, memory is highly restrictive, making code size in most cases even more …

DARM: control-flow melding for SIMT thread divergence reduction

C Saumya, K Sundararajah… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …