Providing consistent state to distributed storage system

LSRK Talluri, R Thirumalaisamy, R Kota, RPR Sadi… - Computers, 2021 - mdpi.com
In cloud storage systems, users must be able to shut down the application when not in use
and restart it from the last consistent state when required. BlobSeer is a data storage …

[PDF][PDF] Snapshotting in Hadoop Distributed File System for Hadoop Open Platform as Service

P Motamari - Tecnico Lisboa, Thesis to obtain the Master of Science …, 2014 - dpss.inesc-id.pt
The amount of data stored in modern data centres is growing rapidly nowadays. Large-scale
distributed file systems, that maintain the massive data sets in data centres, are designed to …

A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

D Manivannan, Q Jiang, J Yang, M Singhal - Information Sciences, 2008 - Elsevier
Checkpointing and rollback recovery are established techniques for handling failures in
distributed systems. Under synchronous checkpointing, each process involved in the …

Can checkpoint/restart mechanisms benefit from hierarchical data staging?

R Rajachandrasekar, X Ouyang, X Besseron… - … Conference on Parallel …, 2011 - Springer
Given the ever-increasing size of supercomputers, fault resilience and the ability to tolerate
faults have become more of a necessity than an option. Checkpoint-Restart protocols have …

Enhancing replica synchronization in hadoop distributed file system

J Kumari, T Biswas, S Vuppala - 2018 9th International …, 2018 - ieeexplore.ieee.org
This paper presents the fault tolerance and replica synchronization among storage server
(Data-node) without the interference of metadata in Hadoop. It employs chunk list data …

[PDF][PDF] A user-triggered checkpointing library for computation-intensive applications

G Deconinck, J Vounckx, R Lauwereins… - International Journal of …, 1997 - Citeseer
We propose a method to incorporate coordinated checkpointing and rollback in high
performance computing applications on massively parallel computers. A library allows the …

Staggered consistent checkpointing

NH Vaidya - IEEE Transactions on Parallel and distributed …, 1999 - ieeexplore.ieee.org
A consistent checkpointing algorithm saves a consistent view of a distributed application's
state on stable storage. The traditional consistent checkpointing algorithms require different …

On coordinated checkpointing in distributed systems

G Cao, M Singhal - IEEE Transactions on Parallel and …, 1998 - ieeexplore.ieee.org
Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case
of failures by preserving a consistent global checkpoint on stable storage. However, the …

Enhancing hadoop system dependability through autonomous snapshot

T Yeh, Y Wang - 2018 IEEE 16th Intl Conf on Dependable …, 2018 - ieeexplore.ieee.org
The cloud computing has successfully facilitated many cutting-edge studies including
Internet of Things, Big Data, and many others in recent years. The accomplishment of cloud …

[PDF][PDF] Snapshots in hadoop distributed file system

S Agarwal, D Borthakur, I Stoica - Technical report, 2011 - sameeragarwal.github.io
The ability to take snapshots is an essential functionality of any file system, as snapshots
enable system administrators to perform data backup and recovery in case of failure. We …