Compiler transformations for high-performance computing

DF Bacon, SL Graham, OJ Sharp - ACM Computing Surveys (CSUR), 1994 - dl.acm.org
In the last three decades a large number of compiler transformations for optimizing programs
have been implemented. Most optimizations for uniprocessors reduce the number of …

[PDF][PDF] On-the-fly detection of data races for programs with nested fork-join parallelism

J Mellor-Crummey - Proceedings of the 1991 ACM/IEEE Conference on …, 1991 - dl.acm.org
Detecting data races in shared-memory parallel programs is an important debugging
problem. This paper presents a new protocol for run-time detection of data races in …

Analysis of benchmark characteristics and benchmark performance prediction

RH Saavedra, AJ Smith - ACM Transactions on Computer Systems …, 1996 - dl.acm.org
Standard benchmarking provides to run-times for given programs on given machines, but
fails to provide insight as to why those results were obtained (either in terms of machine or …

[PDF][PDF] A static performance estimator to guide data partitioning decisions

V Balasundaram, G Fox, K Kennedy… - Proceedings of the third …, 1991 - dl.acm.org
The choice of the data domain partitioning scheme is an important factor in determining the
available parallelism and hence the performance of an application on a distributed memory …

[PDF][PDF] Loop distribution with arbitrary control flow

K Kennedy, KS McKinley - SC, 1990 - Citeseer
Loop distribution is an integral part of transforming a sequential program into a parallel one.
It is used extensively in parallelization, vectorization, and memory management. For loops …

An interactive environment for data partitioning and distribution

V Balasundaram, G Fox… - 5th Distributed …, 1990 - researchwithrutgers.com
An approach to distributed memory parallel programming that has recently become popular
is one where the programmer explicitly specifies the data decomposition using language …

[PDF][PDF] Compile-time support for efficient data race detection in shared-memory parallel programs

J Mellor-Crummey - ACM SIGPLAN Notices, 1993 - dl.acm.org
1 Introduction ln an execution of a shared-memory parallel pro-gram, a data race is said to
exist when there are two or more accesses to the same shared varıable, at least one access …

Automatic and interactive program parallelization using the Cetus source to source compiler infrastructure v2. 0

A Bhosale, P Barakhshan, MR Rosas, R Eigenmann - Electronics, 2022 - mdpi.com
This paper presents an overview and evaluation of the existing and newly added analysis
and transformation techniques in the Cetus source-to-source compiler infrastructure. Cetus …

A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor

V Balasundaram - Journal of Parallel and Distributed Computing, 1990 - Elsevier
An important aspect of any parallel programming tool is its ability to provide useful
information that can help the user optimize a program for efficient parallel execution or …

Interprocedural transformations for parallel code generation

MW Hall, K Kennedy, KS McKinley - Proceedings of the 1991 ACM/IEEE …, 1991 - dl.acm.org
We present a new approach that enables compiler optimization of procedure calls and loop
nests containing procedure calls. We introduce two interprocedural transformations that …