User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …
As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as …
H Yang, A Breslow, J Mars, L Tang - ACM SIGARCH Computer …, 2013 - dl.acm.org
Ensuring the quality of service (QoS) for latency-sensitive applications while allowing co- locations of multiple applications on servers is critical for improving server utilization and …
A Herdrich, E Verplanke, P Autee… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Over the last decade, addressing quality of service (QoS) in multi-core server platforms has been growing research topic. QoS techniques have been proposed to address the shared …
Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management …
Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (eg, web search) need …
S Kim, H Genc, VV Nikiforov, K Asanović… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on …
In this paper we study the impact of sharing memory resources on five Google datacenter applications: a web search engine, bigtable, content analyzer, image stitching, and protocol …
Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness …