ARCHIE: Data analysis acceleration with array caching in hierarchical storage

B Dong, T Wang, H Tang, Q Koziol… - … Conference on Big …, 2018 - ieeexplore.ieee.org
Scientific data analysis typically involves reading massive amounts of data that was
generated by simulations, experiments, and observations. Performance of reading such …

A holistic heterogeneity-aware data placement scheme for hybrid parallel I/O systems

S He, Z Li, J Zhou, Y Yin, X Xu… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
We present H2DP, a holistic heterogeneity-aware data placement scheme for hybrid parallel
I/O systems, which consist of HDD servers and SSD servers. Most of the existing approaches …

Parallel query service for object-centric data management systems

H Tang, S Byna, B Dong… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
While large-scale scientific experiments and simulations produce massive amounts of data,
a small fraction of data contains useful information. Efficient querying on such volume of data …

Apollo: An ML-assisted real-time storage resource observer

N Rajesh, H Devarajan, JC Garcia, K Bateman… - Proceedings of the 30th …, 2021 - dl.acm.org
Applications and middleware services, such as data placement engines, I/O scheduling, and
prefetching engines, require low-latency access to telemetry data in order to make optimal …

Scale-space splatting: Reforming spacetime for cross-scale exploration of integral measures in molecular dynamics

J Pálenik, J Byška, S Bruckner… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Understanding large amounts of spatiotemporal data from particle-based simulations, such
as molecular dynamics, often relies on the computation and analysis of aggregate …

DIRAQ: scalable in situ data-and resource-aware indexing for optimized query performance

S Lakshminarasimhan, X Zou, DA Boyuka, SV Pendse… - Cluster computing, 2014 - Springer
Scientific data analytics in high-performance computing environments has been evolving
along with the advancement of computing capabilities. With the onset of exascale …

Hreplica: a dynamic data replication engine with adaptive compression for multi-tiered storage

H Devarajan, A Kougkas, XH Sun - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
As the diversity of big data applications increases, their requirements diverge and often
conflict with one other. Managing this diversity in any supercomputer or data center is a …

Parallel query evaluation as a scientific data service

B Dong, S Byna, K Wu - 2014 IEEE International Conference on …, 2014 - ieeexplore.ieee.org
Scientific experiments and simulations produce mountains of data in file formats, such as
HDF5, NetCDF, and FITS. Often, a relatively small amount of data holds the key to new …

h5bench: A unified benchmark suite for evaluating HDF5 I/O performance on pre‐exascale platforms

JL Bez, H Tang, S Breitenfeld, H Zheng… - Concurrency and …, 2024 - Wiley Online Library
Parallel I/O is a critical technique for moving data between compute and storage subsystems
of supercomputers. With massive amounts of data produced or consumed by compute …

Toward transparent data management in multi-layer storage hierarchy of hpc systems

B Wadhwa, S Byna, AR Butt - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Upcoming exascale high performance computing (HPC) systems are expected to comprise
multi-tier storage hierarchy, and thus will necessitate innovative storage and I/O …