Checkpointing is an I/O intensive operation increasingly used by High-Performance Computing (HPC) applications to revisit previous intermediate datasets at scale. Unlike the …
LLMs have seen rapid adoption in all domains. They need to be trained on high-end high- performance computing (HPC) infrastructures and ingest massive amounts of input data …
Function-as-a-Service (FaaS) platforms have recently gained rapid popularity. Many stateful applications have been migrated to FaaS platforms due to their ease of deployment …
Efficient checkpointing of distributed data structures periodically at key moments during runtime is a recurring fundamental pattern in a large number of uses cases: fault tolerance …
N Eiling, J Baude, S Lankes… - … and Computation: Practice …, 2022 - Wiley Online Library
In high‐performance computing and cloud computing the introduction of heterogeneous computing resources, such as GPU accelerator have led to a dramatic increase in …
While many HPC applications are known to have long runtimes, this is not always because of single large runs: in many cases, this is due to ensembles composed of many short runs …
N Tan, J Luettgau, J Marquez, K Teranishi… - Proceedings of the …, 2023 - dl.acm.org
Writing large amounts of data concurrently to stable storage is a typical I/O pattern of many HPC workflows. This pattern introduces high I/O overheads and results in increased storage …
C Santana, RCF Araújo, IM Sardina, ÍAS Assis… - Computers & …, 2024 - Elsevier
Many geophysical imaging applications, such as full-waveform inversion, often rely on high- performance computing to meet their demanding computational requirements. The failure of …
High-Performance Computing (HPC) workloads generate large volumes of data at high- frequency during their execution, which needs to be captured concurrently at scale. These …