Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers

T Patel, D Tiwari - 2020 IEEE International Symposium on High …, 2020 - ieeexplore.ieee.org
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements,
and throughput-oriented background jobs, which need to achieve high perfor-mance …

Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

B Li, T Patel, S Samsi, V Gadepally… - Proceedings of the 13th …, 2022 - dl.acm.org
GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …

CoPart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers

J Park, S Park, W Baek - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org
Workload consolidation is a widely-used technique to maximize server resource utilization in
cloud and datacenter computing. Recent commodity CPUs support last-level cache (LLC) …

Satori: efficient and fair resource partitioning by sacrificing short-term benefits for long-term gains

RB Roy, T Patel, D Tiwari - 2021 ACM/IEEE 48th Annual …, 2021 - ieeexplore.ieee.org
Multi-core architectures have enabled data centers to increasingly co-locate multiple jobs to
improve resource utilization and lower the operational cost. Unfortunately, naively co …

Characterizing job microarchitectural profiles at scale: Dataset and analysis

K Wang, Y Li, C Wang, T Jia, K Chow, Y Wen… - Proceedings of the 51st …, 2022 - dl.acm.org
Understanding the microarchitectural resource characteristics of datacenter jobs has
become increasingly critical to guarantee the performance of jobs while improving resource …

{MT^ 2}: Memory Bandwidth Regulation on Hybrid {NVM/DRAM} Platforms

J Yi, B Dong, M Dong, R Tong, H Chen - 20th USENIX Conference on …, 2022 - usenix.org
Non-volatile memory (NVM) has emerged as a new memory media, resulting in a hybrid
NVM/DRAM configuration in typical servers. Memory-intensive applications competing for …

{NyxCache}: Flexible and efficient multi-tenant persistent memory caching

K Wu, K Tu, Y Patel, R Sen, K Park… - … USENIX Conference on …, 2022 - usenix.org
We present NyxCache (Nyx), an access regulation framework for multi-tenant persistent
memory (PM) caching that supports light-weight access regulation, per-cache resource …

EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor

Y Xiang, C Ye, X Wang, Y Luo, Z Wang - Proceedings of the 48th …, 2019 - dl.acm.org
On multi-core processors, contention on shared resources such as the last level cache (LLC)
and memory bandwidth may cause serious performance degradation, which makes efficient …

[HTML][HTML] Effect of hyper-threading in latency-critical multithreaded cloud applications and utilization analysis of the major system resources

L Pons, J Feliu, J Puche, C Huang, S Petit… - Future Generation …, 2022 - Elsevier
Multithreaded latency-critical applications represent an important subset of workloads
running on public cloud systems. Most of these systems deploy powerful computing servers …

Reinforcement learning-based resource partitioning for improving responsiveness in cloud gaming

Y Li, X Wang, H Liu, L Pu, S Tang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cloud gaming has been very popular in recent years, but issues relating to maintaining low
interaction delay to guarantee satisfactory user experience are still prevalent. We observe …