[图书][B] The compiler design handbook: optimizations and machine code generation

YN Srikant, P Shankar - 2002 - taylorfrancis.com
The widespread use of object-oriented languages and Internet security concerns are just the
beginning. Add embedded systems, multiple memory banks, highly pipelined units …

Automatic data layout for distributed-memory machines

K Kennedy, U Kremer - ACM Transactions on Programming Languages …, 1998 - dl.acm.org
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a
simple yet efficient machine-independent parallel programming model. After the algorithm …

[PDF][PDF] IPERF: A framework for automatic construction of performance prediction models

CH Hsu, U Kremer - Workshop on Profile and Feedback-Directed …, 1998 - Citeseer
Performance prediction models at the source code level are crucial in optimizing compilers,
programming environments, and performance debugging tools. For each performance …

A linear algebra framework for automatic determination of optimal data layouts

M Kandemir, A Choudhary, N Shenoy… - … on Parallel and …, 1999 - ieeexplore.ieee.org
This paper presents a data layout optimization technique for sequential and parallel
programs based on the theory of hyperplanes from linear algebra. Given a program, our …

Optimizing data layouts for parallel computation on multicores

Y Zhang, W Ding, J Liu… - … Conference on Parallel …, 2011 - ieeexplore.ieee.org
The emergence of multicore platforms offers several opportunities for boosting application
performance. These opportunities, which include parallelism and data locality benefits …

A compiler technique for improving whole-program locality

MT Kandemir - ACM SIGPLAN Notices, 2001 - dl.acm.org
Exploiting spatial and temporal locality is essential for obtaining high performance on
modern computers. Writing programs that exhibit high locality of reference is difficult and …

Compiler techniques for the distribution of data and computation

A Navarro, E Zapata, D Padua - IEEE Transactions on Parallel …, 2003 - ieeexplore.ieee.org
This paper presents a new method that can be applied by a parallelizing compiler to find,
without user intervention, the iteration and data decompositions that minimize …

A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors

J Garcia, E Ayguade, J Labarta - IEEE Transactions on Parallel …, 2001 - ieeexplore.ieee.org
Parallel architectures with physically distributed memory provide a cost-effective scalability
to solve many large scale scientific problems. However, these systems are very difficult to …

Tools and techniques for automatic data layout: A case study

E Ayguadé, J Garcia, U Kremer - Parallel Computing, 1998 - Elsevier
Parallel architectures with physically distributed memory providing computing cycles and
large amounts of memory are becoming more and more common. To make such …

[PDF][PDF] An integer linear programming approach for optimizing cache locality

M Kandemir, P Banerjee, A Choudhary… - Proceedings of the 13th …, 1999 - dl.acm.org
The actual performance of programs on modern processors that employ deep memory
hierarchies is closely related to the performance of the memory subsystem. Compiler …