On fusing recursive traversals of Kd trees

S Rajbhandari, J Kim, S Krishnamoorthy… - Proceedings of the 25th …, 2016 - dl.acm.org
Loop fusion is a key program transformation for data locality optimization that is
implemented in production compilers. But optimizing compilers for imperative languages …

Dynamic determinacy race detection for task parallelism with futures

R Surendran, V Sarkar - International Conference on Runtime Verification, 2016 - Springer
Existing dynamic determinacy race detectors for task-parallel programs are limited to
programs with strict computation graphs, where a task can only wait for its descendant tasks …

Optimized two-level parallelization for gpu accelerators using the polyhedral model

J Shirako, A Hayashi, V Sarkar - … of the 26th International Conference on …, 2017 - dl.acm.org
While GPUs play an increasingly important role in today's high-performance computers,
optimizing GPU performance continues to impose large burdens upon programmers. A …

A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

S Rajbhandari, J Kim, S Krishnamoorthy… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
This paper describes the design and implementation of a layered domain-specific compiler
to support MADNESS—Multiresolution ADaptive Numerical Environment for Scientific …

Cache locality optimization for recursive programs

J Lifflander, S Krishnamoorthy - Proceedings of the 38th ACM SIGPLAN …, 2017 - dl.acm.org
We present an approach to optimize the cache locality for recursive programs by
dynamically splicing---recursively interleaving---the execution of distinct function …

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …

Applications of complex augmented kernels to wind profile prediction

A Kuh, D Mandic - … on Acoustics, Speech and Signal Processing, 2009 - ieeexplore.ieee.org
This paper combines complex signal processing with kernel methods for applications in
wind prediction. Specifically, we consider developing least squares kernel algorithms for …

Static prediction of parallel computation graphs

SK Muller - Proceedings of the ACM on Programming Languages, 2022 - dl.acm.org
Many algorithms for analyzing parallel programs, for example to detect deadlocks or data
races or to calculate the execution cost, are based on a model variously known as a cost …

Deadlock avoidance in parallel programs with futures: why parallel tasks should not wait for strangers

T Cogumbreiro, R Surendran, F Martins… - Proceedings of the …, 2017 - dl.acm.org
Futures are an elegant approach to expressing parallelism in functional programs. However,
combining futures with imperative programming (as in C++ or in Java) can lead to pernicious …

The Parallel Semantics Program Dependence Graph

B Homerding, A Patel, EA Deiana, Y Su, Z Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
A compiler's intermediate representation (IR) defines a program's execution plan by
encoding its instructions and their relative order. Compiler optimizations aim to replace a …