Towards Resilient Method: An exhaustive survey of fault tolerance methods in the cloud computing environment

MA Shahid, N Islam, MM Alam, MS Mazliham… - Computer Science …, 2021 - Elsevier
Transient Faults: According to some … fault tolerance include forward recovery, backward
recovery & checkpoint. Backward recovery and forward recovery are two fundamental fault

Reinit: Evaluating the performance of global-restart recovery methods for mpi fault tolerance

G Georgakoudis, L Guo, I Laguna - International Conference on High …, 2020 - Springer
… for MPI fault tolerance. Specifically, it provides an overview of the recovery models for …
The causes for those failures may be transient faults or hard faults of hardware components. …

A survey of fault-tolerance techniques for embedded systems from the perspective of power, energy, and thermal issues

S Safari, M Ansari, H Khdr, P Gohari-Nazari… - IEEE …, 2022 - ieeexplore.ieee.org
… techniques aim at detecting the faults and recover from them (if possible), to let the system
… This work has considered transient and permanent faults that are caused by thermal cycling. …

Improving availability of multicore real-time systems suffering both permanent and transient faults

J Zhou, XS Hu, Y Ma, J Sun, T Wei… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
… to tolerate up to one transient fault for each task since single-faulttolerance is a common
assumption [10]. Given task ti executing at frequency fi with a recovery task running at the same …

[图书][B] Fault-tolerant systems

I Koren, CM Krishna - 2020 - books.google.com
over noisy channels, which are channels that are subject to many transient failures. These
… If we know that devices can partially recover from stress, we can schedule rest periods to …

[HTML][HTML] A survey of fault tolerance in cloud computing

P Kumari, P Kaur - Journal of King Saud University-Computer and …, 2021 - Elsevier
… rollback recovery, is a widely used policy to handle faults in … The preceding recovery-based
approaches are not applied … to know the impact of transient failures (a failure that occurs for …

Fault tolerance in iterative-convergent machine learning

A Qiao, B Aragam, B Zhang… - … Conference on Machine …, 2019 - proceedings.mlr.press
… of partial recovery from checkpoints, we simulate failures of varying … full recovery with the
rework costs incurred by partial recovery. For each model, we sample the failure iteration from a …

BOND: Flexible failure recovery in software defined networks

Q Li, Y Liu, Z Zhu, H Li, Y Jiang - Computer Networks, 2019 - Elsevier
… BOND, a flexible failure recovery system in SDNs. Firstly, … failure recovery with a global
hash table and precisely select backup paths to avoid potential congestions in the post-recovery

Peak-power-aware primary-backup technique for efficient fault-tolerance in multicore embedded systems

M Ansari, M Salehi, S Safari, A Ejlali, M Shafique - IEEE Access, 2020 - ieeexplore.ieee.org
transient fault rate is equal 10−7 faults per second [23], [44]. To achieve fault tolerance against
transient faults, … Elnozahy, ‘‘The interplay of power management and fault recovery in real-…

Energy efficient fault tolerance techniques in green cloud computing: A systematic survey and taxonomy

S Bharany, S Badotra, S Sharma, S Rani… - Sustainable Energy …, 2022 - Elsevier
… High fault tolerance in the cloud is a must to attain high … is to gain insight into the fault tolerance
techniques that are available … the fault recovery methods after the fault tolerance system …