Ernest: Efficient performance prediction for {Large-Scale} advanced analytics

S Venkataraman, Z Yang, M Franklin, B Recht… - … USENIX Symposium on …, 2016 - usenix.org
Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …

[图书][B] Sourcebook of parallel computing

J Dongarra, I Foster, G Fox, W Gropp, K Kennedy… - 2003 - websrv.cs.fsu.edu
Michael Mascagni Monte Carlo methods (MCMs) have been, and continue to be, very
popular algorithms for solving a wide variety of problems in science, engineering, and …

The GrADS project: Software support for high-level grid application development

F Berman, A Chien, K Cooper… - … Journal of High …, 2001 - journals.sagepub.com
Advances in networking technologies will soon make it possible to use the global
information infrastructure in a qualitatively different way—as a computational as well as an …

The rise and fall of High Performance Fortran: an historical object lesson

K Kennedy, C Koelbel, H Zima - Proceedings of the third ACM SIGPLAN …, 2007 - dl.acm.org
High Performance Fortran (HPF) is a high-level data-parallel programming system based on
Fortran. The effort to standardize HPF began in 1991, at the Supercomputing Conference in …

SUIF Explorer: an interactive and interprocedural parallelizer

SW Liao, A Diwan, RP Bosch Jr, A Ghuloum… - Proceedings of the …, 1999 - dl.acm.org
The SUIF Explorer is an interactive parallelization tool that is more effective than previous
systems in minimizing the number of lines of code that require programmer assistance. First …

Multiprocessors should support simple memory consistency models

MD Hill - Computer, 1998 - ieeexplore.ieee.org
In the future, many computers will contain multiple processors, in part because the marginal
cost of adding a few additional processors is so low that only minimal performance gain is …

The Jrpm system for dynamically parallelizing Java programs

MK Chen, K Olukotun - Proceedings of the 30th annual international …, 2003 - dl.acm.org
We describe the Java runtime parallelizing machine (Jrpm), a complete system for
parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor …

[PDF][PDF] Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

P Ranganathan, VS Pai, SV Adve - … of the ninth annual ACM symposium …, 1997 - dl.acm.org
This paper studies techniques to improue the performance of memory consistency models
for shared-memory multi-processors with ILP processors. The first part of this paper extends …

HPCView: A tool for top-down analysis of node performance

J Mellor-Crummey, RJ Fowler, G Marin… - The Journal of …, 2002 - Springer
It is increasingly difficult for complex scientific programs to attain a significant fraction of peak
performance on systems that are based on microprocessors with substantial instruction-level …

Using thread-level speculation to simplify manual parallelization

MK Prabhu, K Olukotun - Proceedings of the ninth ACM SIGPLAN …, 2003 - dl.acm.org
In this paper, we provide examples of how thread-level speculation (TLS) simplifies manual
parallelization and enhances its performance. A number of techniques for manual …