DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

Figaro: Improving system performance via fine-grained in-dram data relocation and caching

Y Wang, L Orosa, X Peng, Y Guo… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Main memory, composed of DRAM, is a performance bottleneck for many applications, due
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …

I-spy: Context-driven conditional instruction prefetching with coalescing

TA Khan, A Sriraman, J Devietti… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Modern data center applications have rapidly expanding instruction footprints that lead to
frequent instruction cache misses, increasing cost and degrading data center performance …

Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations

J Mars, L Tang, R Hundt, K Skadron… - Proceedings of the 44th …, 2011 - dl.acm.org
As much of the world's computing continues to move into the cloud, the overprovisioning of
computing resources to ensure the performance isolation of latency-sensitive tasks, such as …

Bingo spatial data prefetcher

M Bakhshalipour, M Shakerinava… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Applications extensively use data objects with a regular and fixed layout, which leads to the
recurrence of access patterns over memory regions. Spatial data prefetching techniques …

Dspatch: Dual spatial pattern prefetcher

R Bera, AV Nori, O Mutlu, S Subramoney - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
High main memory latency continues to limit performance of modern high-performance out-
of-order cores. While DRAM latency has remained nearly the same over many generations …

Asmdb: understanding and mitigating front-end stalls in warehouse-scale computers

G Ayers, NP Nagendra, DI August, HK Cho… - Proceedings of the 46th …, 2019 - dl.acm.org
The large instruction working sets of private and public cloud workloads lead to frequent
instruction cache misses and costs in the millions of dollars. While prior work has identified …

Ripple: Profile-guided instruction cache replacement for data center applications

TA Khan, D Zhang, A Sriraman… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Modern data center applications exhibit deep software stacks, resulting in large instruction
footprints that frequently cause instruction cache misses degrading performance, cost, and …

Rethinking software runtimes for disaggregated memory

I Calciu, MT Imran, I Puddu, S Kashyap… - Proceedings of the 26th …, 2021 - dl.acm.org
Disaggregated memory can address resource provisioning inefficiencies in current
datacenters. Multiple software runtimes for disaggregated memory have been proposed in …