Early detection of configuration errors to reduce failure damage

T Xu, X Jin, P Huang, Y Zhou, S Lu, L Jin… - … USENIX Symposium on …, 2016 - usenix.org
Early detection is the key to minimizing failure damage induced by configuration errors,
especially those errors in configurations that control failure handling and fault tolerance …

Context-based online configuration-error detection

D Yuan, Y Xie, R Panigrahy, J Yang… - Proceedings of the …, 2011 - dl.acm.org
Software failures due to configuration errors are commonplace as computer systems
continue to grow larger and more complex. Troubleshooting these configuration errors is a …

Testing configuration changes in context to prevent production failures

X Sun, R Cheng, J Chen, E Ang, O Legunsen… - … USENIX Symposium on …, 2020 - usenix.org
Large-scale cloud services deploy hundreds of configuration changes to production systems
daily. At such velocity, configuration changes have inevitably become prevalent causes of …

An empirical study on configuration errors in commercial and open source systems

Z Yin, X Ma, J Zheng, Y Zhou… - Proceedings of the …, 2011 - dl.acm.org
Configuration errors (ie, misconfigurations) are among the dominant causes of system
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …

Understanding, detecting and localizing partial failures in large system software

C Lou, P Huang, S Smith - 17th USENIX Symposium on Networked …, 2020 - usenix.org
Partial failures occur frequently in cloud systems and can cause serious damage including
inconsistency and data loss. Unfortunately, these failures are not well understood. Nor can …

Systems approaches to tackling configuration errors: A survey

T Xu, Y Zhou - ACM Computing Surveys (CSUR), 2015 - dl.acm.org
In recent years, configuration errors (ie, misconfigurations) have become one of the
dominant causes of system failures, resulting in many severe service outages and …

Be conservative: Enhancing failure diagnosis with proactive logging

D Yuan, S Park, P Huang, Y Liu, MM Lee… - … USENIX Symposium on …, 2012 - usenix.org
When systems fail in the field, logged error or warning messages are frequently the only
evidence available for assessing and diagnosing the underlying cause. Consequently, the …

Do not blame users for misconfigurations

T Xu, J Zhang, P Huang, J Zheng, T Sheng… - Proceedings of the …, 2013 - dl.acm.org
Similar to software bugs, configuration errors are also one of the major causes of today's
system failures. Many configuration issues manifest themselves in ways similar to software …

[HTML][HTML] Using causality to diagnose configuration bugs

M Attariyan - 2008 USENIX Annual Technical Conference (USENIX …, 2008 - usenix.org
We present a novel method for diagnosing configuration management errors. Our proposed
approach deduces the state of a buggy computer by running predicates that test system …

Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults

SKS Hari, SV Adve, H Naeimi… - ACM SIGARCH …, 2012 - dl.acm.org
Future microprocessors need low-cost solutions for reliable operation in the presence of
failure-prone devices. A promising approach is to detect hardware faults by deploying low …