User-facing latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and …
Deep learning (DL) training jobs bring some unique challenges to existing cluster managers, such as unpredictable training times, an all-or-nothing execution model, and …
Deep learning workloads are common in today's production clusters due to the proliferation of deep learning driven AI services (eg, speech recognition, machine translation). A deep …
Serverless computing is becoming increasingly popular due to its ease of programming, fast elasticity, and fine-grained billing. However, the serverless provider still needs to provision …
Video cameras are pervasively deployed for security and smart city scenarios, with millions of them in large cities worldwide. Achieving the potential of these cameras requires …
We describe the design and implementation of Protean--the Microsoft Azure service responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services in both the research community and industry. When operating a datacenter, optimization of …
W Chen, W Wang, B Peng, Q Wen, T Zhou… - Proceedings of the 28th …, 2022 - dl.acm.org
Time series forecasting is a critical and challenging problem in many real applications. Recently, Transformer-based models prevail in time series forecasting due to their …
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet …