Exploit both {SMART} Attributes and {NAND} Flash Wear Characteristics to Effectively Forecast {SSD-based} Storage Failures in Clusters

Y Gu, C Wu, X He - … USENIX Annual Technical Conference (USENIX ATC …, 2024 - usenix.org
Solid State Drives (SSDs) based on flash technology are extensively employed as high-
performance storage solutions in supercomputing data centers. However, SSD failures are …

{RL-Watchdog}: A Fast and Predictable {SSD} Liveness Watchdog on Storage Systems

JY Ha, S Lee, HY Yeom, Y Son - 2024 USENIX Annual Technical …, 2024 - usenix.org
This paper proposes a reinforcement learning-based watchdog (RLW) that examines solid-
state drive (SSD) liveness or failures by faults (eg, controller/power faults and high …

A disk failure prediction model for multiple issues

Y Guan, Y Liu, K Zhou, Q Li, T Wang, H Li - Frontiers of Information …, 2023 - Springer
Disk failure prediction methods have been useful in handing a single issue, eg,
heterogeneous disks, model aging, and minority samples. However, because these issues …

{MSFRD}: Mutation Similarity based {SSD} Failure Rating and Diagnosis for Complex and Volatile Production Environments

Y Zhang, T Zhang, W Hao, S Wang, N Liu… - 2024 USENIX Annual …, 2024 - usenix.org
SSD failures have an increasing impact on storage reliability and performance in data
centers. Some manufacturers have customized fine-grained Telemetry attributes to analyze …

Ada-WL: An Adaptive Wear-Leveling Aware Data Migration Approach for Flexible SSD Array Scaling in Clusters

Y Gu, L Liu, C Wu, J Li, M Guo - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recently, the flash-based Solid State Drive (SSD) array has been widely implemented in
real-world large-scale clusters. With the increasing number of users in upper-tier …

RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

A Thomasian - arXiv preprint arXiv:2401.03235, 2024 - arxiv.org
RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity
of small disks increased 100-fold in 1990s the production of large disks was discontinued …

一个针对多种问题的磁盘故障预测模型

关云川, 刘渝, 周可, 李强, 王团结… - 信息与电子工程前沿 …, 2023 - fitee.zjujournals.com
磁盘故障预测方法在单一问题上的解决方案十分成熟, 例如磁盘异构问题,
模型老化问题和小样本问题. 然而, 由于这些问题经常同时存在, 只能处理其中一个问题的模型在 …

A Hierarchical Modeling Approach for Assessing the Reliability and Performability of Burst Buffers

E Borba, R Salkhordeh, S Mimouni, E Tavares… - … on Architecture of …, 2024 - Springer
High availability is a crucial aspect of High-Performance Computing. Solid-state drives
(SSD) offer peak bandwidth as node-local burst buffers. The limited write endurance of …

Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors

R Vishwakarma, J Hwang, S Messoudi… - Conformal and …, 2023 - proceedings.mlr.press
Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the
disk. However, scrubbing the entire storage array at once can adversely impact system …

Machine Learning Based Collaborative Prediction of SSD Failures in the Cloud

Y Jiang, R Lu, S Zhou, Q Li - 2024 International Conference on …, 2024 - ieeexplore.ieee.org
SSDs (Solid-State Drives) have become integral components in modern data centers. Under
such massive deployment, ensuring their reliability, longevity, and optimal performance is …