Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu

B Wu, Z Zhao, EZ Zhang, Y Jiang, X Shen - ACM SIGPLAN Notices, 2013 - dl.acm.org
The performance of Graphic Processing Units (GPU) is sensitive to irregular memory
references. Some recent work shows the promise of data reorganization for eliminating non …

SIMD parallelization of applications that traverse irregular data structures

B Ren, G Agrawal, JR Larus… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Fine-grained data parallelism is increasingly common in mainstream processors in the form
of longer vectors and on-chip GPUs. This paper develops support for exploiting such data …

General transformations for GPU execution of tree traversals

M Goldfarb, Y Jo, M Kulkarni - … of the International Conference on High …, 2013 - dl.acm.org
With the advent of programmer-friendly GPU computing environments, there has been much
interest in offloading workloads that can exploit the high degree of parallelism available on …

On fusing recursive traversals of Kd trees

S Rajbhandari, J Kim, S Krishnamoorthy… - Proceedings of the 25th …, 2016 - dl.acm.org
Loop fusion is a key program transformation for data locality optimization that is
implemented in production compilers. But optimizing compilers for imperative languages …

Verifying array manipulating programs by tiling

S Chakraborty, A Gupta, D Unadkat - International Static Analysis …, 2017 - Springer
Formally verifying properties of programs that manipulate arrays in loops is computationally
challenging. In this paper, we focus on a useful class of such programs, and present a novel …

Efficient execution of recursive programs on commodity vector hardware

B Ren, Y Jo, S Krishnamoorthy, K Agrawal… - ACM SIGPLAN …, 2015 - dl.acm.org
The pursuit of computational efficiency has led to the proliferation of throughput-oriented
hardware, from GPUs to increasingly wide vector units on commodity processors and …

Automatically enhancing locality for tree traversals with traversal splicing

Y Jo, M Kulkarni - Proceedings of the ACM international conference on …, 2012 - dl.acm.org
Generally applicable techniques for improving temporal locality in irregular programs, which
operate over pointer-based data structures such as trees and graphs, are scarce. Focusing …

Automatic vectorization of tree traversals

Y Jo, M Goldfarb, M Kulkarni - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
Repeated tree traversals are ubiquitous in many domains such as scientific simulation, data
mining and graphics. Modern commodity processors support SIMD instructions, and using …

Miniphases: Compilation using modular and efficient tree transformations

D Petrashko, O Lhoták, M Odersky - Proceedings of the 38th ACM …, 2017 - dl.acm.org
Production compilers commonly perform dozens of transformations on an intermediate
representation. Running those transformations in separate passes harms performance. One …

A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

S Rajbhandari, J Kim, S Krishnamoorthy… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
This paper describes the design and implementation of a layered domain-specific compiler
to support MADNESS—Multiresolution ADaptive Numerical Environment for Scientific …