A survey on cache tuning from a power/energy perspective

W Zang, A Gordon-Ross - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
Low power and/or energy consumption is a requirement not only in embedded systems that
run on batteries or have limited cooling capabilities, but also in desktop and mainframes …

Hybrid, scalable, trace-driven performance modeling of GPGPUs

Y Arafa, AH Badawy, A ElWazir, A Barai… - Proceedings of the …, 2021 - dl.acm.org
In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-
GPU achieves scalability through a hybrid high-level modeling approach where some …

Generating cache hints for improved program efficiency

K Beyls, EH D'Hollander - Journal of Systems Architecture, 2005 - Elsevier
One of the new extensions in EPIC architectures are cache hints. On each memory
instruction, two kinds of hints can be attached: a source cache hint and a target cache hint …

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

Y Arafa, AH Badawy, G Chennupati, A Barai… - Proceedings of the 34th …, 2020 - dl.acm.org
In this paper, we introduce an accurate and scalable memory modeling framework for
General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance …

Cache optimization for embedded processor cores: An analytical approach

A Ghosh, T Givargis - ACM Transactions on Design Automation of …, 2004 - dl.acm.org
Embedded microprocessor cores are increasingly being used in embedded and mobile
devices. The software running on these embedded microprocessor cores is often a priori …

Admission policies for caches of search engine results

R Baeza-Yate, F Junqueira, V Plachouras… - String Processing and …, 2007 - Springer
This paper studies the impact of the tail of the query distribution on caches of Web search
engines, and proposes a technique for achieving higher hit ratios compared to traditional …

Architecture independent performance characterization and benchmarking for scientific applications

E Strohmaier, H Shan - The IEEE Computer Society's 12th …, 2004 - ieeexplore.ieee.org
A simple, tunable, synthetic benchmark with a performance directly related to applications
would be of great benefit to the scientific computing community. We present a novel …

Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation

S Nilakantan, K Sangaiah, A More… - … Analysis of Systems …, 2015 - ieeexplore.ieee.org
Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over
execution-driven simulation, such as reducing simulation time and complexity, and allowing …

Scalable performance prediction of codes with memory hierarchy and pipelines

G Chennupati, N Santhi, S Eidenbenz - Proceedings of the 2019 ACM …, 2019 - dl.acm.org
We present the Analytical Memory Model with Pipelines (AMMP) of the Performance
Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware …

A scalable analytical memory model for CPU performance prediction

G Chennupati, N Santhi, R Bird, S Thulasidasan… - … , and Simulation: 8th …, 2018 - Springer
As the US Department of Energy (DOE) invests in exascale computing, performance
modeling of physics codes on CPUs remain a challenge in computational co-design due to …