A survey of cache simulators

H Brais, R Kalayappan, PR Panda - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Computer architecture simulation tools are essential for implementing and evaluating new
ideas in the domain and can be useful for understanding the behavior of programs and …

Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor

Y Zhang, W Yang, K Li, D Tang, K Li - Journal of Parallel and Distributed …, 2021 - Elsevier
Sparse matrix-vector multiplication (SpMV) has always been a hot topic of research for
scientific computing and big data processing, but the sparsity and discontinuity of the …

An analytical model for performance and lifetime estimation of hybrid DRAM-NVM main memories

R Salkhordeh, O Mutlu, H Asadi - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Emerging Non-Volatile Memories (NVMs) have promising advantages (eg, lower idle power,
higher density, and non-volatility) over the existing predominant main memory technology …

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

Y Arafa, AH Badawy, G Chennupati, A Barai… - Proceedings of the 34th …, 2020 - dl.acm.org
In this paper, we introduce an accurate and scalable memory modeling framework for
General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance …

ReuseTracker: Fast Yet Accurate Multicore Reuse Distance Analyzer

MA Sasongko, M Chabbi, MB Marzijarani… - ACM Transactions on …, 2021 - dl.acm.org
One widely used metric that measures data locality is reuse distance—the number of unique
memory locations that are accessed between two consecutive accesses to a particular …

Analytical derivation of concurrent reuse distance profile for multi-threaded application running on chip multi-processor

JM Sabarimuthu, TG Venkatesh - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Reuse distance has been shown to be a useful metric for performance analysis of caches
and programs, locality analysis and compiler optimization. Concurrent reuse distance profile …

NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors

JE Shim, M Kang, TH Han - IEEE Access, 2023 - ieeexplore.ieee.org
The processing of data-intensive applications, followed by an unprecedented amount of
data traffic, drives explosive accesses to the memory subsystem. The overloaded memory …

Analytical modeling the multi-core shared cache behavior with considerations of data-sharing and coherence

M Ling, X Lu, G Wang, J Ge - IEEE Access, 2021 - ieeexplore.ieee.org
To mitigate the ever worsening “Power wall” and “Memory wall” problems, multi-core
architectures with multi-level cache hierarchies have been widely accepted in modern …

A Profiling-Based Approach to Cache Partitioning of Program Data

S Breiter, J Weidendorfer, MT Chung… - … Conference on Parallel …, 2022 - Springer
Cache efficiency is important to avoid unnecessary data transfers and to keep processors
active. Cache partitioning, a technique to virtually divide a cache into multiple partitions, has …

Fast modeling L2 cache reuse distance histograms using combined locality information from software traces

M Ling, J Ge, G Wang - Journal of Systems Architecture, 2020 - Elsevier
To mitigate the performance gap between CPU and the main memory, multi-level cache
architectures are widely used in modern processors. Therefore, modeling the behaviors of …