Divergence analysis and optimizations

G Li, P Li, G Sawaya, G Gopalakrishnan… - Proceedings of the 17th …, 2012 - dl.acm.org

Programs written for GPUs often contain correctness errors such as races, deadlocks, or
may compute the wrong result. Existing debugging tools often miss these errors because of …

被引用次数：182 相关文章所有 11 个版本

[PDF] acm.org

Divergence reduction in Monte Carlo neutron transport with on-GPU asynchronous scheduling

B Cuneo, M Bailey - ACM Transactions on Modeling and Computer …, 2024 - dl.acm.org

While Monte Carlo Neutron Transport (MCNT) is near-embarrasingly parallel, the effectively
unpredictable lifetime of neutrons can lead to divergence when MCNT is evaluated on …

被引用次数：2 相关文章

[PDF] ed.ac.uk

Automatic optimization of thread-coarsening for graphics processors

A Magni, C Dubach, M O'Boyle - … of the 23rd international conference on …, 2014 - dl.acm.org

OpenCL has been designed to achieve functional portability across multi-core devices from
different vendors. However, the lack of a single cross-target optimizing compiler severely …

被引用次数：104 相关文章所有 8 个版本

[PDF] psu.edu

A large-scale cross-architecture evaluation of thread-coarsening

A Magni, C Dubach, MFP O'Boyle - Proceedings of the International …, 2013 - dl.acm.org

OpenCL has become the de-facto data parallel programming model for parallel devices in
today's high-performance supercomputers. OpenCL was designed with the goal of …

被引用次数：92 相关文章所有 9 个版本

[PDF] cam.ac.uk

A sparse probabilistic learning algorithm for real-time tracking

Blake, Cipolla - Proceedings Ninth IEEE International …, 2003 - ieeexplore.ieee.org

We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …

被引用次数：159 相关文章所有 18 个版本

[PDF] psu.edu

Convergence and scalarization for data-parallel architectures

Y Lee, R Krashinsky, V Grover… - Proceedings of the …, 2013 - ieeexplore.ieee.org

Modern throughput processors such as GPUs achieve high performance and efficiency by
exploiting data parallelism in application kernels expressed as threaded code. One draw …

被引用次数：85 相关文章所有 14 个版本

[PDF] uni-saarland.de

Partial control-flow linearization

S Moll, S Hack - ACM SIGPLAN Notices, 2018 - dl.acm.org

If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a
SIMD program, several targets of a branch might be executed because of divergence …

被引用次数：48 相关文章所有 4 个版本

[PDF] whiterose.ac.uk

Effective function merging in the ssa form

RCO Rocha, P Petoumenos, Z Wang, M Cole… - Proceedings of the 41st …, 2020 - dl.acm.org

Function merging is an important optimization for reducing code size. This technique
eliminates redundant code across functions by merging them into a single function. While …

被引用次数：31 相关文章所有 15 个版本

[PDF] whiterose.ac.uk

Function merging by sequence alignment

RCO Rocha, P Petoumenos, Z Wang… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

Resource-constrained devices for embedded systems are becoming increasingly important.
In such systems, memory is highly restrictive, making code size in most cases even more …

被引用次数：38 相关文章所有 19 个版本

[PDF] nsf.gov

DARM: control-flow melding for SIMT thread divergence reduction

C Saumya, K Sundararajah… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org

GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group
of threads—wavefront or warp—execute instructions in lockstep. When threads in a group …

被引用次数：10 相关文章所有 7 个版本

高级搜索

QQ 群