A survey on cache management mechanisms for real-time embedded systems

G Gracioli, A Alhammad, R Mancuso… - ACM Computing …, 2015 - dl.acm.org
Multicore processors are being extensively used by real-time systems, mainly because of
their demand for increased computing power. However, multicore processors have shared …

Heracles: Improving resource efficiency at scale

D Lo, L Cheng, R Govindaraju… - Proceedings of the …, 2015 - dl.acm.org
User-facing, latency-sensitive services, such as websearch, underutilize their computing
resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …

Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations

J Mars, L Tang, R Hundt, K Skadron… - Proceedings of the 44th …, 2011 - dl.acm.org
As much of the world's computing continues to move into the cloud, the overprovisioning of
computing resources to ensure the performance isolation of latency-sensitive tasks, such as …

Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers

H Yang, A Breslow, J Mars, L Tang - ACM SIGARCH Computer …, 2013 - dl.acm.org
Ensuring the quality of service (QoS) for latency-sensitive applications while allowing co-
locations of multiple applications on servers is critical for improving server utilization and …

Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family

A Herdrich, E Verplanke, P Autee… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Over the last decade, addressing quality of service (QoS) in multi-core server platforms has
been growing research topic. QoS techniques have been proposed to address the shared …

PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches

Y Xie, GH Loh - ACM SIGARCH Computer Architecture News, 2009 - dl.acm.org
Many multi-core processors employ a large last-level cache (LLC) shared among the
multiple cores. Past research has demonstrated that sharing-oblivious cache management …

Ubik: Efficient cache sharing with strict QoS for latency-critical workloads

H Kasture, D Sanchez - ACM Sigplan Notices, 2014 - dl.acm.org
Chip-multiprocessors (CMPs) must often execute workload mixes with different performance
requirements. On one hand, user-facing, latency-critical applications (eg, web search) need …

MoCA: Memory-centric, adaptive execution for multi-tenant deep neural networks

S Kim, H Genc, VV Nikiforov, K Asanović… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Driven by the wide adoption of deep neural networks (DNNs) across different application
domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on …

The impact of memory subsystem resource sharing on datacenter applications

L Tang, J Mars, N Vachharajani, R Hundt… - ACM SIGARCH …, 2011 - dl.acm.org
In this paper we study the impact of sharing memory resources on five Google datacenter
applications: a web search engine, bigtable, content analyzer, image stitching, and protocol …

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

H Cook, M Moreto, S Bird, K Dao… - ACM SIGARCH …, 2013 - dl.acm.org
Computing workloads often contain a mix of interactive, latency-sensitive foreground
applications and recurring background computations. To guarantee responsiveness …