Characterization and prediction of performance loss and MTTR during fault recovery on scale-out storage using DOE & RSM: a case study with Ceph

LW Kong, O Moreno - IEEE Transactions on Cloud Computing, 2018 - ieeexplore.ieee.org
Recognizing the impact from cluster recovery operations on performance and mean time to
recovery (MTTR) is essential to maintain service and availability objectives. Testing the …

A proactive failure tolerant mechanism for SSDs storage systems based on unsupervised learning

H Zhou, Z Niu, G Wang, XG Liu, D Liu… - 2021 IEEE/ACM 29th …, 2021 - ieeexplore.ieee.org
As a proactive failure tolerant mechanism in large scale cloud storage systems, drive failure
prediction can be used to protect data by early warning before real failures of drives, and …

Optimizing communication performance in scale-out storage system

U Song, B Jeong, S Park, K Lee - Cluster Computing, 2019 - Springer
Ceph is an object-based scale-out storage system that is widely used in the cloud computing
environment due to its scalable and reliable characteristics. Although there are many factors …

[HTML][HTML] Hybrid approach for improving the performance of data reliability in cloud storage management

A Alzahrani, T Alyas, K Alissa, Q Abbas, Y Alsaawy… - Sensors, 2022 - mdpi.com
The digital transformation disrupts the various professional domains in different ways,
though one aspect is common: the unified platform known as cloud computing. Corporate …

HFBT: An Efficient Hierarchical Fault-tolerant Method for Cloud Storage System

L Xiao, B Zou, C Zhu, M Zeng… - 2021 IEEE Intl Conf on …, 2021 - ieeexplore.ieee.org
With the development of information technology and wide application of smart devices, data
increases exponentially and more and more data are stored in the cloud, whereas most of …

Bandwidth-aware delayed repair in distributed storage systems

J Shen, J Gu, Y Zhou, X Wang - 2016 IEEE/ACM 24th …, 2016 - ieeexplore.ieee.org
In data storage systems, data are typically stored in redundant storage nodes to ensure
storage reliability. When storage nodes fail, with the help of the redundant nodes, the lost …

Multi-view feature-based {SSD} failure prediction: What, when, and why

Y Zhang, W Hao, B Niu, K Liu, S Wang, N Liu… - … USENIX Conference on …, 2023 - usenix.org
Solid state drives (SSDs) play an important role in large-scale data centers. SSD failures
affect the stability of storage systems and cause additional maintenance overhead. To …

Predicting hard drive failures for cloud storage systems

D Liu, B Wang, P Li, RJ Stones, TG Marbach… - … and Architectures for …, 2020 - Springer
To improve reactive hard-drive fault-tolerance techniques, many statistical and machine
learning methods have been proposed for failure prediction based on SMART attributes …

Understanding the resiliency of cloud storage services

A Ghosh, J Lakshmi - 2022 IEEE 27th Pacific Rim International …, 2022 - ieeexplore.ieee.org
A cloud storage system requires multiple functional and management layers to render a
global-scale storage solution. Providing a reliable service through this complex architecture …

[PDF][PDF] Incorporating Proactive Data Rescue into ZFS Disk Recovery for Enhanced Storage Reliability

Z Qiao, S Fu, H Chen, M Lang - 2017 - sc17.supercomputing.org
Computations and simulations help advance knowledge in science, energy, and national
security. Over the years, they have become more accurate to generate more realistic …