Autopilot: workload autoscaling at google

K Rzadca, P Findeisen, J Swiderski, P Zych… - Proceedings of the …, 2020 - dl.acm.org
In many public and private Cloud systems, users need to specify a limit for the amount of
resources (CPU cores and RAM) to provision for their workloads. A job that exceeds its limits …

Predictive performance modeling for distributed batch processing using black box monitoring and machine learning

C Witt, M Bux, W Gusew, U Leser - Information Systems, 2019 - Elsevier
In many domains, the previous decade was characterized by increasing data volumes and
growing complexity of data analyses, creating new demands for batch processing on …

Pocket: Elastic ephemeral storage for serverless analytics

A Klimovic, Y Wang, P Stuedi, A Trivedi… - … USENIX Symposium on …, 2018 - usenix.org
Serverless computing is becoming increasingly popular, enabling users to quickly launch
thousands of shortlived tasks in the cloud with high elasticity and fine-grain billing. These …

Llama: A heterogeneous & serverless framework for auto-tuning video analytics pipelines

F Romero, M Zhao, NJ Yadwadkar… - Proceedings of the ACM …, 2021 - dl.acm.org
The proliferation of camera-enabled devices and large video repositories has led to a
diverse set of video analytics applications. These applications rely on video pipelines …

Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments

MT Islam, S Karunasekera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Big data frameworks such as Spark and Hadoop are widely adopted to run analytics jobs in
both research and industry. Cloud offers affordable compute resources which are easier to …

Finding Faster Configurations Using FLASH

V Nair, Z Yu, T Menzies, N Siegmund… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Finding good configurations of a software system is often challenging since the number of
configuration options can be large. Software engineers often make poor choices about …

Taming performance variability

A Maricq, D Duplyakin, I Jimenez, C Maltzahn… - … USENIX Symposium on …, 2018 - usenix.org
The performance of compute hardware varies: software run repeatedly on the same server
(or a different server with supposedly identical parts) can produce performance results that …

Allox: compute allocation in hybrid clusters

TN Le, X Sun, M Chowdhury, Z Liu - Proceedings of the Fifteenth …, 2020 - dl.acm.org
Modern deep learning frameworks support a variety of hardware, including CPU, GPU, and
other accelerators, to perform computation. In this paper, we study how to schedule jobs …

Arrow: Low-level augmented bayesian optimization for finding the best cloud vm

CJ Hsu, V Nair, VW Freeh… - 2018 IEEE 38th …, 2018 - ieeexplore.ieee.org
With the advent of big data applications, which tend to have longer execution time, choosing
the right cloud VM has significant performance and economic implications. For example, in …

Selecta: Heterogeneous cloud storage configuration for data analytics

A Klimovic, H Litz, C Kozyrakis - 2018 USENIX Annual Technical …, 2018 - usenix.org
Data analytics are an important class of data-intensive workloads on public cloud services.
However, selecting the right compute and storage configuration for these applications is …