B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org
While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …
A Magni, C Dubach, M O'Boyle - … of the 23rd international conference on …, 2014 - dl.acm.org
OpenCL has been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely …
A Magni, C Dubach, MFP O'Boyle - Proceedings of the International …, 2013 - dl.acm.org
OpenCL has become the de-facto data parallel programming model for parallel devices in today's high-performance supercomputers. OpenCL was designed with the goal of …
We address the problem of applying powerful pattern recognition algorithms based on kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …
Y Lee, R Krashinsky, V Grover… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One draw …
S Moll, S Hack - ACM SIGPLAN Notices, 2018 - dl.acm.org
If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a SIMD program, several targets of a branch might be executed because of divergence …
Function merging is an important optimization for reducing code size. This technique eliminates redundant code across functions by merging them into a single function. While …
Resource-constrained devices for embedded systems are becoming increasingly important. In such systems, memory is highly restrictive, making code size in most cases even more …
GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …