Monitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage systems. Unfortunately, with increasing I/O requirements and resource contention, I/O …
Scientific computing workloads at HPC facilities have been shifting from traditional numerical simulations to AI/ML applications for training and inference while processing and …
AK Paul, O Faaland, A Moody… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance …
S Wang, Q Cao, Z Lu, H Jiang, J Yao… - 2022 USENIX Annual …, 2022 - usenix.org
Popular software storage architecture Linux Multiple-Disk (MD) for parity-based RAID (eg, RAID5 and RAID6) assigns one or more centralized worker threads to efficiently process all …
Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training …
B Yang, H Wei, W Zhu, Y Zhang, W Liu… - 2024 USENIX Annual …, 2024 - usenix.org
The system architecture of contemporary supercomputers is growing increasingly intricate with the ongoing evolution of system-wide network and storage technologies, making it …
S Wang, Q Cao, H Jiang, Z Lu, J Yao, Y Chen… - ACM Transactions on …, 2024 - dl.acm.org
Following a conventional design principle that pays more fast-CPU-cycles for fewer slow- I/Os, popular software storage architecture Linux Multiple-Disk (MD) for parity-based RAID …
This paper presents a scalable page cache called ScaleCache for improving SSD scalability. Specifically, we first propose a concurrent data structure of page cache based on …
The combination of ever-growing scientific datasets and distributed workflow complexity creates I/O performance bottlenecks due to data volume, velocity, and variety. Although the …