Accelerating multicore reuse distance analysis with sampling and parallelization

DL Schuff, M Kulkarni, VS Pai - … of the 19th international conference on …, 2010 - dl.acm.org
Reuse distance analysis is a well-established tool for predicting cache performance, driving
compiler optimizations, and assisting visualization and manual optimization of programs …

A statistical multiprocessor cache model

E Berg, H Zeffer, E Hagersten - 2006 IEEE International …, 2006 - ieeexplore.ieee.org
The introduction of general-purpose microprocessors running multiple threads will put a
focus on methods and tools helping a programmer to write efficient parallel applications …

Refactoring for data locality

K Beyls, EH D'Hollander - Computer, 2009 - ieeexplore.ieee.org
Refactoring for data locality opens a new avenue for performance-oriented program
rewriting. SLO has broken down a large part of the complexity that software developers face …

Discovery of locality-improving refactorings by reuse path analysis

K Beyls, EH D'Hollander - … , HPCC 2006, Munich, Germany, September 13 …, 2006 - Springer
Due to the huge speed gaps in the memory hierarchy of modern computer architectures, it is
important that programs maintain a good data locality. Improving temporal locality implies …

Visualizing non-functional requirements

NA Ernst, Y Yu, J Mylopoulos - 2006 First International …, 2006 - ieeexplore.ieee.org
Information systems can be visualized with many tools. Typically these tools present
functional artifacts from various phases of the development life-cycle; these include …

Adaptive reorder buffers for SMT processors

J Sharkey, D Balkan, D Ponomarev - Proceedings of the 15th …, 2006 - dl.acm.org
In SMT processors, the complex interplay between private and shared datapath resources
needs to be considered in order to realize the full performance potential. In this paper, we …

[PDF][PDF] Data cache-energy and throughput models: design exploration for embedded processors

MY Qadri, KD McDonald-Maier - EURASIP journal on embedded systems, 2009 - Springer
Abstract Most modern 16-bit and 32-bit embedded processors contain cache memories to
further increase instruction throughput of the device. Embedded processors that contain …

Intermediately executed code is the key to find refactorings that improve temporal data locality

K Beyls, EH D'Hollander - Proceedings of the 3rd conference on …, 2006 - dl.acm.org
The growing speed gap between memory and processor makes an efficient use of the cache
ever more important to reach high performance. One of the most important ways to improve …

Towards architecture independent metrics for multicore performance analysis

M Kulkarni, V Pai, D Schuff - ACM SIGMETRICS Performance Evaluation …, 2011 - dl.acm.org
The prevalence of multicore architectures has made the performance analysis of
multithreaded applications an intriguing area of inquiry. An understanding of locality effects …

Path-based reuse distance analysis

C Fang, S Carr, S Önder, Z Wang - International Conference on Compiler …, 2006 - Springer
Profiling can effectively analyze program behavior and provide critical information for
feedback-directed or dynamic optimizations. Based on memory profiling, reuse distance …