Morpheus: Towards automated {SLOs} for enterprise clusters

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

被引用次数：4 相关文章所有 2 个版本

[PDF] usenix.org

{FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices

H Qiu, SS Banerjee, S Jha, ZT Kalbarczyk… - 14th USENIX symposium …, 2020 - usenix.org

User-facing latency-sensitive web services include numerous distributed,
intercommunicating microservices that promise to simplify software development and …

被引用次数：239 相关文章所有 11 个版本

[PDF] usenix.org

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org

Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

被引用次数：387 相关文章所有 13 个版本

[PDF] kaust.edu.sa

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Y Peng, Y Bao, Y Chen, C Wu, C Guo - Proceedings of the Thirteenth …, 2018 - dl.acm.org

Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …

被引用次数：473 相关文章所有 3 个版本

[PDF] nsf.gov

Faster and cheaper serverless computing on harvested resources

Y Zhang, Í Goiri, GI Chaudhry, R Fonseca… - Proceedings of the …, 2021 - dl.acm.org

Serverless computing is becoming increasingly popular due to its ease of programming, fast
elasticity, and fine-grained billing. However, the serverless provider still needs to provision …

被引用次数：93 相关文章所有 4 个版本

[PDF] usenix.org

Live video analytics at scale with approximation and {Delay-Tolerance}

H Zhang, G Ananthanarayanan, P Bodik… - … USENIX Symposium on …, 2017 - usenix.org

Video cameras are pervasively deployed for security and smart city scenarios, with millions
of them in large cities worldwide. Achieving the potential of these cameras requires …

被引用次数：496 相关文章所有 11 个版本

[PDF] usenix.org

Protean:{VM} allocation service at scale

O Hadary, L Marshall, I Menache, A Pan… - … USENIX Symposium on …, 2020 - usenix.org

We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …

被引用次数：146 相关文章所有 7 个版本

[PDF] arxiv.org

Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Hu, P Sun, S Yan, Y Wen, T Zhang - Proceedings of the International …, 2021 - dl.acm.org

Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …

被引用次数：103 相关文章所有 6 个版本

[PDF] researchgate.net

Learning to rotate: Quaternion transformer for complicated periodical time series forecasting

W Chen, W Wang, B Peng, Q Wen, T Zhou… - Proceedings of the 28th …, 2022 - dl.acm.org

Time series forecasting is a critical and challenging problem in many real applications.
Recently, Transformer-based models prevail in time series forecasting due to their …

被引用次数：49 相关文章所有 2 个版本

[PDF] acm.org

InferLine: latency-aware provisioning and scaling for prediction serving pipelines

D Crankshaw, GE Sela, X Mo, C Zumar… - Proceedings of the 11th …, 2020 - dl.acm.org

Serving ML prediction pipelines spanning multiple models and hardware accelerators is a
key challenge in production machine learning. Optimally configuring these pipelines to meet …

被引用次数：113 相关文章所有 3 个版本

高级搜索

QQ 群