Predicting inter-thread cache contention on a chip multi-processor architecture

D Chandra, F Guo, S Kim… - … Symposium on High …, 2005 - ieeexplore.ieee.org
This paper studies the impact of L2 cache sharing on threads that simultaneously share the
cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads …

[图书][B] Automatic performance tuning of sparse matrix kernels

RW Vuduc - 2003 - search.proquest.com
This dissertation presents an automated system to generate highly efficient, platform-
adapted implementations of sparse matrix kernels. We show that conventional …

A framework for performance modeling and prediction

A Snavely, L Carrington, N Wolter… - SC'02: Proceedings …, 2002 - ieeexplore.ieee.org
Cycle-accurate simulation is far too slow for modeling the expected performance of full
parallel applications on large HPC systems. And just running an application on a system …

Tiling optimizations for 3D scientific computations

G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …

A survey on cache tuning from a power/energy perspective

W Zang, A Gordon-Ross - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
Low power and/or energy consumption is a requirement not only in embedded systems that
run on batteries or have limited cooling capabilities, but also in desktop and mainframes …

StatCache: A probabilistic approach to efficient and accurate data locality analysis

E Berg, E Hagersten - … Analysis of Systems and Software, 2004, 2004 - ieeexplore.ieee.org
The widening memory gap reduces performance of applications with poor data locality.
Therefore, there is a need for methods to analyze data locality and help application …

Combined selection of tile sizes and unroll factors using iterative compilation

T Kisuki, PMW Knijnenburg… - … Conference on Parallel …, 2000 - ieeexplore.ieee.org
Loop tiling and unrolling are two important program transformations to exploit locality and
expose instruction level parallelism, respectively. In this paper, we address the problem of …

Self-adapting linear algebra algorithms and software

J Demmel, J Dongarra, V Eijkhout… - Proceedings of the …, 2005 - ieeexplore.ieee.org
One of the main obstacles to the efficient solution of scientific problems is the problem of
tuning software, both to the available architecture and to the user problem at hand. We …

Counting integer points in parametric polytopes using Barvinok's rational functions

S Verdoolaege, R Seghir, K Beyls, V Loechner… - Algorithmica, 2007 - Springer
Many compiler optimization techniques depend on the ability to calculate the number of
elements that satisfy certain conditions. If these conditions can be represented by linear …

Program locality analysis using reuse distance

Y Zhong, X Shen, C Ding - ACM Transactions on Programming …, 2009 - dl.acm.org
On modern computer systems, the memory performance of an application depends on its
locality. For a single execution, locality-correlated measures like average miss rate or …