Making disk failure predictions {SMARTer}!

S Lu, B Luo, T Patel, Y Yao, D Tiwari… - 18th USENIX Conference …, 2020 - usenix.org
Disk drives are one of the most commonly replaced hardware components and continue to
pose challenges for accurate failure prediction. In this work, we present analysis and …

Perseus: A {Fail-Slow} detection framework for cloud storage systems

R Lu, E Xu, Y Zhang, F Zhu, Z Zhu, M Wang… - … USENIX Conference on …, 2023 - usenix.org
The newly-emerging''fail-slow''failures plague both software and hardware where the victim
components are still functioning yet with degraded performance. To address this problem …

More than capacity: Performance-oriented evolution of pangu in alibaba

Q Li, Q Xiang, Y Wang, H Song, R Wen, W Yao… - … USENIX Conference on …, 2023 - usenix.org
This paper presents how the Pangu storage system continuously evolves with hardware
technologies and the business model to provide high-performance, reliable storage services …

An in-depth analysis of cloud block storage workloads in large-scale production

J Li, Q Wang, PPC Lee, C Shi - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Cloud block storage systems support diverse types of applications in modern cloud services.
Characterizing their I/O activities is critical for guiding better system designs and …

Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected

T Patel, S Byna, GK Lockwood, D Tiwari - Proceedings of the …, 2019 - dl.acm.org
Large-scale applications typically spend a large fraction of their execution time performing
I/O to a parallel storage system. However, with rapid progress in compute and storage …

A study of {SSD} reliability in large scale enterprise storage deployments

S Maneas, K Mahdaviani, T Emami… - 18th USENIX Conference …, 2020 - usenix.org
This paper presents the first large-scale field study of NAND-based SSDs in enterprise
storage systems (in contrast to drives in distributed data center storage systems). The study …

Multi-view feature-based {SSD} failure prediction: What, when, and why

Y Zhang, W Hao, B Niu, K Liu, S Wang, N Liu… - … USENIX Conference on …, 2023 - usenix.org
Solid state drives (SSDs) play an important role in large-scale data centers. SSD failures
affect the stability of storage systems and cause additional maintenance overhead. To …

Fighting the fog of war: Automated incident detection for cloud systems

L Li, X Zhang, X Zhao, H Zhang, Y Kang… - 2021 USENIX Annual …, 2021 - usenix.org
Incidents and outages dramatically degrade the availability of large-scale cloud computing
systems such as AWS, Azure, and GCP. In current incident response practice, each team …

Operational Characteristics of {SSDs} in Enterprise Storage Systems: A {Large-Scale} Field Study

S Maneas, K Mahdaviani, T Emami… - 20th USENIX Conference …, 2022 - usenix.org
As we increasingly rely on SSDs for our storage needs, it is important to understand their
operational characteristics in the field, in particular since they vary from HDDs. This includes …

Shaving retries with sentinels for fast read over high-density 3D flash

Q Li, M Ye, Y Cui, L Shi, X Li, TW Kuo… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
High-density flash-memory chips are under tremendous demands with the exponential
growth of data. At the same time, the slow read performance of these high-density flash …