Supercomputer performance evaluation and the perfect benchmarks

DH Bailey, E Barszcz, JT Barton… - The International …, 1991 - journals.sagepub.com

A new set of benchmarks has been developed for the performance evaluation of highly
parallel supercom puters. These consist of five" parallel kernel" bench marks and three" …

被引用次数：3285 相关文章所有 33 个版本

[PDF] acm.org

[PDF][PDF] The NAS parallel benchmarks—summary and preliminary results

DH Bailey, E Barszcz, JT Barton, DS Browning… - Proceedings of the …, 1991 - dl.acm.org

A new set of benchmarks has been developed for the performance evaluation of highly
parallel supercomputers. These benchmarks consist of five 'parallel kernel” benchmarks and …

被引用次数：818 相关文章所有 6 个版本

[PDF] psu.edu

Maximizing loop parallelism and improving data locality via loop fusion and distribution

K Kennedy, KS McKinley - … Workshop on Languages and Compilers for …, 1993 - Springer

Loop fusion is a program transformation that merges multiple loops into one. It is effective for
reducing the synchronization overhead of parallel loops and for improving data locality. This …

被引用次数：388 相关文章所有 8 个版本

[PDF] psu.edu

A historical application profiler for use by parallel schedulers

R Gibbons - Workshop on Job Scheduling Strategies for Parallel …, 1997 - Springer

Scheduling algorithms that use application and system knowledge have been shown to be
more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not …

被引用次数：313 相关文章所有 24 个版本

[PDF] acm.org

Practical dependence testing

G Goff, K Kennedy, CW Tseng - ACM SIGPLAN Notices, 1991 - dl.acm.org

Precise and efficient dependence tests are essential to theeffectivermss ofaparallelizing
compiler. This paper proposes a dependence testing scheme based on classifyingpairs …

被引用次数：391 相关文章所有 5 个版本

[PDF] acm.org

Analysis of benchmark characteristics and benchmark performance prediction

RH Saavedra, AJ Smith - ACM Transactions on Computer Systems …, 1996 - dl.acm.org

Standard benchmarking provides to run-times for given programs on given machines, but
fails to provide insight as to why those results were obtained (either in terms of machine or …

被引用次数：304 相关文章所有 14 个版本

Design issues in division and other floating-point operations

SF Oberman, MJ Flynn - IEEE Transactions on computers, 1997 - ieeexplore.ieee.org

Floating-point division is generally regarded as a low frequency, high latency operation in
typical floating-point applications. However, in the worst case, a high latency hardware …

被引用次数：268 相关文章所有 9 个版本

[PDF] psu.edu

[PDF][PDF] An implementation of interprocedural bounded regular section analysis

P Havlak, K Kennedy - IEEE Transactions on Parallel and Distributed …, 1991 - Citeseer

Optimizing compilers should produce e cient code even in the presence of high-level
language constructs. However, current programming support systems are signi cantly …

被引用次数：397 相关文章所有 7 个版本

[PDF] acm.org

Optimizing for parallelism and data locality

K Kennedy, KS McKinley - … of the 6th international conference on …, 1992 - dl.acm.org

Previous research has used program transformation to introduce parallelism and to exploit
data locality. Unfortunately, these two objectives have usually been considered …

被引用次数：235 相关文章所有 9 个版本

[PDF] acm.org

[PDF][PDF] Gated SSA-based demand-driven symbolic analysis for parallelizing compilers

P Tu, D Padua - Proceedings of the 9th International Conference on …, 1995 - dl.acm.org

In this paper, we present a GSA-based technique that performs more efficient and more
precise symbolic analysis of predicated assignments, recurrences and index arrays. The …

被引用次数：190 相关文章所有 11 个版本

高级搜索

QQ 群