Towards dynamic and safe configuration tuning for cloud databases

X Zhang, H Wu, Y Li, J Tan, F Li, B Cui - Proceedings of the 2022 …, 2022 - dl.acm.org
Configuration knobs of database systems are essential to achieve high throughput and low
latency. Recently, automatic tuning systems using machine learning methods (ML) have …

Finding the right cloud configuration for analytics clusters

M Bilal, M Canini, R Rodrigues - … of the 11th ACM Symposium on Cloud …, 2020 - dl.acm.org
Finding good cloud configurations for deploying a single distributed system is already a
challenging task, and it becomes substantially harder when a data analytics cluster is …

Locat: Low-overhead online configuration auto-tuning of spark sql applications

J Xin, K Hwang, Z Yu - … of the 2022 International Conference on …, 2022 - dl.acm.org
Spark SQL has been widely deployed in industry but it is challenging to tune its
performance. Recent studies try to employ machine learning (ML) to solve this problem, but …

Rover: An online Spark SQL tuning service via generalized transfer learning

Y Shen, X Ren, Y Lu, H Jiang, H Xu, D Peng… - Proceedings of the 29th …, 2023 - dl.acm.org
Distributed data analytic engines like Spark are common choices to process massive data in
industry. However, the performance of Spark SQL highly depends on the choice of …

You only run once: spark auto-tuning from a single run

DB Prats, FA Portella, CHA Costa… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems
are based on iteratively running workloads with different configurations. During the …

C3o: Collaborative cluster configuration optimization for distributed data processing in public clouds

J Will, L Thamsen, D Scheinert… - … Conference on Cloud …, 2021 - ieeexplore.ieee.org
Distributed dataflow systems enable data-parallel processing of large datasets on clusters.
Public cloud providers offer a large variety and quantity of resources that can be used for …

Bellamy: Reusing performance models for distributed dataflow jobs across contexts

D Scheinert, L Thamsen, H Zhu, J Will… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Distributed dataflow systems enable the use of clusters for scalable data analytics. However,
selecting appropriate cluster resources for a processing job is often not straightforward …

Accelerating the configuration tuning of big data analytics with similarity-aware multitask bayesian optimization

A Fekry, L Carata, T Pasquier… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
One of the key challenges for data analytics deployment is configuration tuning. The existing
approaches for configuration tuning are expensive and overlook the dynamic characteristics …

Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimization

MM Öztürk - Knowledge and Information Systems, 2024 - Springer
When there is a need to make an ultimate decision about the unique features of big data
platforms, one should note that they have configurable parameters. Apache Spark is an …

Enel: Context-aware dynamic scaling of distributed dataflow jobs using graph propagation

D Scheinert, H Zhu, L Thamsen… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data
analytics. While runtime prediction models can be used to initially select appropriate cluster …