Effective automatic parallelization of stencil computations

S Krishnamoorthy, M Baskaran, U Bondhugula… - ACM sigplan …, 2007 - dl.acm.org
Performance optimization of stencil computations has been widely studied in the literature,
since they occur in many computationally intensive scientific and engineering applications …

Optimizing compiler for the cell processor

AE Eichenbergert, K O'Brien, K O'Brien… - 14th International …, 2005 - ieeexplore.ieee.org
Developed for multimedia and game applications, as well as other numerically intensive
workloads, the CELL processor provides support both for highly parallel codes, which have …

The Jrpm system for dynamically parallelizing Java programs

MK Chen, K Olukotun - Proceedings of the 30th annual international …, 2003 - dl.acm.org
We describe the Java runtime parallelizing machine (Jrpm), a complete system for
parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor …

Component based simulation modeling with Simkit

A Buss - Proceedings of the Winter Simulation Conference, 2002 - ieeexplore.ieee.org
This paper demonstrates how to use Simkit to create discrete event simulation models using
a component framework. The component framework is based on a listener design pattern …

Auditory distance perception by translating observers

JM Speigle, JM Loomis - … of 1993 IEEE Research Properties in …, 1993 - ieeexplore.ieee.org
The authors consider auditory distance perception of a moving observer and its relevance
for the perception of stationary and moving sources. They begin with a review of some of the …

Toast: A heterogeneous memory management system

M Bailleu, D Stavrakakis, R Rocha… - Proceedings of the …, 2024 - dl.acm.org
Modern applications employ several heterogeneous memory types for improved
performance, security, and reliability. To manage them, programmers must currently digress …

[PDF][PDF] Implementation of NAS parallel benchmarks in high performance fortran

M Frumkin, H Jin, J Yan - NAS Techinical Report NAS-98-009, 1998 - academia.edu
We present an HPF implementation of BT, SP, LU, FT, CG and MG of the NPB2. 3-serial
benchmark set. The implementation is based on HPF performance model of the benchmark …

TEST: a tracer for extracting speculative threads

M Chen, K Olukotun - International Symposium on Code …, 2003 - ieeexplore.ieee.org
Thread-level speculation (TLS) allows sequential programs to be arbitrarily decomposed
into threads that can be safely executed in parallel. A key challenge for TLS processors is …

Increasing temporal locality with skewing and recursive blocking

G Jin, J Mellor-Crummey, R Fowler - Proceedings of the 2001 ACM/IEEE …, 2001 - dl.acm.org
We present a strategy, called recursive prismatic time skewing, that increase temporal reuse
at all memory hierarchy levels, thus improving the performance of scientific codes that use …

Automatic data and computation decomposition on distributed memory parallel computers

P Lee, ZM Kedem - ACM Transactions on Programming Languages and …, 2002 - dl.acm.org
To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus
on decomposing the computation (mainly by distributing the iterations of the nested Do …