The state of the art of metadata managements in large-scale distributed file systems—scalability, performance and availability

H Dai, Y Wang, KB Kent, L Zeng… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
File system metadata is the data in charge of maintaining namespace, permission semantics
and location of file data blocks. Operations on the metadata can account for up to 80% of …

Ad hoc file systems for high-performance computing

A Brinkmann, K Mohror, W Yu, P Carns… - Journal of Computer …, 2020 - Springer
Storage backends of parallel compute clusters are still based mostly on magnetic disks,
while newer and faster storage technologies such as flash-based SSDs or non-volatile …

Simurgh: a fully decentralized and secure NVMM user space file system

N Moti, F Schimmelpfennig, R Salkhordeh… - Proceedings of the …, 2021 - dl.acm.org
The availability of non-volatile main memory (NVMM) has started a new era for storage
systems and NVMM specific file systems can support extremely high data and metadata …

Chfs: Parallel consistent hashing file system for node-local persistent memory

O Tatebe, K Obata, K Hiraga, H Ohtsuji - International Conference on …, 2022 - dl.acm.org
This paper proposes a design for CHFS, an ad hoc parallel file system that utilizes the
persistent memory of compute nodes. The design is based entirely on a highly scalable …

Hvac: Removing i/o bottleneck for large-scale deep learning applications

A Khan, AK Paul, C Zimmer, S Oral… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Scientific communities are increasingly adopting deep learning (DL) models in their
applications to accelerate scientific discovery processes. However, with rapid growth in the …

{GIFT}: A coupon based {Throttle-and-Reward} mechanism for fair and efficient {I/O} bandwidth management on parallel storage systems

T Patel, R Garg, D Tiwari - 18th USENIX Conference on File and Storage …, 2020 - usenix.org
Large-scale parallel applications are highly data-intensive and perform terabytes of I/O
routinely. Unfortunately, on a large-scale system where multiple applications run …

Metawbc: Posix-compliant metadata write-back caching for distributed file systems

Y Qian, W Cheng, L Zeng, MA Vef… - … Conference for High …, 2022 - ieeexplore.ieee.org
In parallel and distributed file systems, caching can improve data performance and metadata
operations. Currently, most distributed file systems adopt a write-back data cache for …

Improving checkpointing intervals by considering individual job failure probabilities

A Frank, M Baumgartner, R Salkhordeh… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Checkpointing is a popular resilience method in HPC and its efficiency highly depends on
the choice of the checkpoint interval. Standard analytical approaches optimize intervals for …

Combining Buffered {I/O} and Direct {I/O} in Distributed File Systems

Y Qian, MA Vef, P Farrell, A Dilger, X Li… - … USENIX Conference on …, 2024 - usenix.org
Direct I/O allows I/O requests to bypass the Linux page cache and was introduced over 20
years ago as an alternative to the default buffered I/O mode. However, high-performance …

Nvmm-oriented hierarchical persistent client caching for lustre

W Cheng, C Li, L Zeng, Y Qian, X Li… - ACM Transactions on …, 2021 - dl.acm.org
In high-performance computing (HPC), data and metadata are stored on special server
nodes and client applications access the servers' data and metadata through a network …