Early detection of configuration errors to reduce failure damage

T Xu, X Jin, P Huang, Y Zhou, S Lu, L Jin… - … USENIX Symposium on …, 2016 - usenix.org
Early detection is the key to minimizing failure damage induced by configuration errors,
especially those errors in configurations that control failure handling and fault tolerance …

Testing configuration changes in context to prevent production failures

X Sun, R Cheng, J Chen, E Ang, O Legunsen… - … USENIX Symposium on …, 2020 - usenix.org
Large-scale cloud services deploy hundreds of configuration changes to production systems
daily. At such velocity, configuration changes have inevitably become prevalent causes of …

Learning patterns in configuration

R Bhagwan, S Mehta… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
Large services depend on correct configuration to run efficiently and seamlessly. Checking
such configuration for correctness is important because services use a large and …

Reef: Retainable evaluator execution framework

M Weimer, Y Chen, BG Chun, T Condie… - Proceedings of the …, 2015 - dl.acm.org
Resource Managers like Apache YARN have emerged as a critical layer in the cloud
computing system stack, but the developer abstractions for leasing cluster resources and …

Reef: Retainable evaluator execution framework

BG Chun, T Condie, C Curino, C Douglas… - Proceedings of the …, 2013 - dl.acm.org
In this demo proposal, we describe REEF, a framework that makes it easy to implement
scalable, fault-tolerant runtime environments for a range of computational models. We will …

Finding heterogeneous-unsafe configuration parameters in cloud systems

S Ma, F Zhou, MD Bond, Y Wang - Proceedings of the Sixteenth …, 2021 - dl.acm.org
With the increasing prevalence of heterogeneous hardware and the increasing need for
online reconfiguration, there is increasing demand for heterogeneous configurations …

Apache REEF: Retainable evaluator execution framework

BG Chun, T Condie, Y Chen, B Cho, A Chung… - ACM Transactions on …, 2017 - dl.acm.org
Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud
computing system stack, but the developer abstractions for leasing cluster resources and …

Challenges to error diagnosis in hadoop ecosystems

JZ Li, S He, L Zhu, X Xu, M Fu, L Bass, A Liu… - 27th Large Installation …, 2013 - usenix.org
Deploying a large-scale distributed ecosystem such as HBase/Hadoop in the cloud is
complicated and error-prone. Multiple layers of largely independently evolving software are …

A Real‐Time Detection Method of Software Configuration Errors Based on Fine‐Grained Configuration Item Types

L Zhang, S Hao, M Ming - Scientific Programming, 2022 - Wiley Online Library
With the continuous expansion of software scale and the continuous complexity of software
functions, abnormal parameter configuration often brings adverse effects to the software …

Intelligent software service configuration technology based on association mining

F Wang, Z Zhao, Z Wang, M Ma… - Journal of Physics …, 2022 - iopscience.iop.org
Association relationship and types between software service configuration parameters
determine that the configuration items must meet the corresponding rules and constraints in …