W Huang, J Fang, S Wan, C Xie… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data reliability and availability, and serviceability (RAS) of erasure-coded data centers are highly affected by data repair induced by node failures. In a traditional failure identification …
In this document, we present our approaches for understanding and discovering scalability faults, ie faults whose symptoms appear at larger scales but are not visible at smaller scales …
Cloud systems are becoming increasingly complex and performance bugs are inevitable. Performance bugs are notoriously difficult to debug and fix due to lack of diagnostic …
CA Stuardo, HN Zhu, PJ Chapman, C Rubio-Gonzalez… - people.cs.uchicago.edu
We present SCALEVIEW, a framework for identifying and analyzing potential scalability faults in large-scale distributed systems. SCALEVIEW combines instrumentation and …
J He, T Dai, X Gu - arXiv preprint arXiv:2110.04101, 2021 - arxiv.org
Timeout bugs can cause serious availability and performance issues which are often difficult to fix due to the lack of diagnostic information. Previous work proposed solutions for fixing …
Research for large-scale system is challenging because deploying a large system needs a great amount of resources. My approach to address this problem is based on the …