The NAS parallel benchmarks

DH Bailey, E Barszcz, JT Barton… - The International …, 1991 - journals.sagepub.com
A new set of benchmarks has been developed for the performance evaluation of highly
parallel supercom puters. These consist of five" parallel kernel" bench marks and three" …

[PDF][PDF] The NAS parallel benchmarks—summary and preliminary results

DH Bailey, E Barszcz, JT Barton, DS Browning… - Proceedings of the …, 1991 - dl.acm.org
A new set of benchmarks has been developed for the performance evaluation of highly
parallel supercomputers. These benchmarks consist of five 'parallel kernel” benchmarks and …

Maximizing loop parallelism and improving data locality via loop fusion and distribution

K Kennedy, KS McKinley - … Workshop on Languages and Compilers for …, 1993 - Springer
Loop fusion is a program transformation that merges multiple loops into one. It is effective for
reducing the synchronization overhead of parallel loops and for improving data locality. This …

A historical application profiler for use by parallel schedulers

R Gibbons - Workshop on Job Scheduling Strategies for Parallel …, 1997 - Springer
Scheduling algorithms that use application and system knowledge have been shown to be
more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not …

Practical dependence testing

G Goff, K Kennedy, CW Tseng - ACM SIGPLAN Notices, 1991 - dl.acm.org
Precise and efficient dependence tests are essential to theeffectivermss ofaparallelizing
compiler. This paper proposes a dependence testing scheme based on classifyingpairs …

Analysis of benchmark characteristics and benchmark performance prediction

RH Saavedra, AJ Smith - ACM Transactions on Computer Systems …, 1996 - dl.acm.org
Standard benchmarking provides to run-times for given programs on given machines, but
fails to provide insight as to why those results were obtained (either in terms of machine or …

Design issues in division and other floating-point operations

SF Oberman, MJ Flynn - IEEE Transactions on computers, 1997 - ieeexplore.ieee.org
Floating-point division is generally regarded as a low frequency, high latency operation in
typical floating-point applications. However, in the worst case, a high latency hardware …

[PDF][PDF] An implementation of interprocedural bounded regular section analysis

P Havlak, K Kennedy - IEEE Transactions on Parallel and Distributed …, 1991 - Citeseer
Optimizing compilers should produce e cient code even in the presence of high-level
language constructs. However, current programming support systems are signi cantly …

Optimizing for parallelism and data locality

K Kennedy, KS McKinley - … of the 6th international conference on …, 1992 - dl.acm.org
Previous research has used program transformation to introduce parallelism and to exploit
data locality. Unfortunately, these two objectives have usually been considered …

[PDF][PDF] Gated SSA-based demand-driven symbolic analysis for parallelizing compilers

P Tu, D Padua - Proceedings of the 9th International Conference on …, 1995 - dl.acm.org
In this paper, we present a GSA-based technique that performs more efficient and more
precise symbolic analysis of predicated assignments, recurrences and index arrays. The …