Predicting DRAM reliability in the field with machine learning

I Giurgiu, J Szabo, D Wiesmann, J Bird - … of the 18th ACM/IFIP/USENIX …, 2017 - dl.acm.org
Uncorrectable errors in dynamic random access memory (DRAM) are a common form of
hardware failure in server clusters. Failures are costly both in terms of hardware …

Enhanced reliability modeling of raid storage systems

JG Elerath, M Pecht - 37th Annual IEEE/IFIP International …, 2007 - ieeexplore.ieee.org
A flexible model for estimating reliability of RAID storage systems is presented. This model
corrects errors associated with the common assumption that system times to failure follow a …

On the performance variation in modern storage stacks

Z Cao, V Tarasov, HP Raman, D Hildebrand… - … USENIX conference on …, 2017 - usenix.org
Ensuring stable performance for storage stacks is important, especially with the growth in
popularity of hosted services where customers expect QoS guarantees. The same …

Random-forest-based failure prediction for hard disk drives

J Shen, J Wan, SJ Lim, L Yu - International Journal of …, 2018 - journals.sagepub.com
Failure prediction for hard disk drives is a typical and effective approach to improve the
reliability of storage systems. In a large-scale data center environment, the various brands …

Hybrid approach for improving the performance of data reliability in cloud storage management

A Alzahrani, T Alyas, K Alissa, Q Abbas, Y Alsaawy… - Sensors, 2022 - mdpi.com
The digital transformation disrupts the various professional domains in different ways,
though one aspect is common: the unified platform known as cloud computing. Corporate …

Large scale predictive analytics for hard disk remaining useful life estimation

P Anantharaman, M Qiao… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Hard disk failure prediction plays an important role in reducing data center downtime and
improving service reliability. In contrast to existing work of modeling the prediction problem …

Disk failure prediction in data centers via online learning

J Xiao, Z Xiong, S Wu, Y Yi, H Jin, K Hu - Proceedings of the 47th …, 2018 - dl.acm.org
Disk failure has become a major concern with the rapid expansion of storage systems in
data centers. Based on SMART (Self-Monitoring, Analysis and Reporting Technology) …

Reliability analysis of ssds under power fault

M Zheng, J Tucek, F Qin, M Lillibridge… - ACM Transactions on …, 2016 - dl.acm.org
Modern storage technology (solid-state disks (SSDs), NoSQL databases, commoditized
RAID hardware, etc.) brings new reliability challenges to the already-complicated storage …

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime

Y Cai, G Yalcin, O Mutlu, EF Haratsch… - 2012 IEEE 30th …, 2012 - ieeexplore.ieee.org
With the continued scaling of NAND flash and multi-level cell technology, flash-based
storage has gained widespread use in systems ranging from mobile platforms to enterprise …

Being accurate is not enough: New metrics for disk failure prediction

J Li, RJ Stones, G Wang, Z Li, X Liu… - 2016 IEEE 35th …, 2016 - ieeexplore.ieee.org
Traditionally, disk failure prediction accuracy is used to evaluate disk failure prediction
model. However, accuracy may not reflect their practical usage (protecting against failures …