Unified Virtual Memory (UVM) was recently introduced with CUDA version 8 and the Pascal GPU. The older CUDA programming style is akin to older large-memory UNIX applications …
T Jain, G Cooperman - SC20: International Conference for High …, 2020 - ieeexplore.ieee.org
The share of the top 500 supercomputers with NVIDIA GPUs is now over 25% and continues to grow. While fault tolerance is a critical issue for supercomputing, there does not currently …
Driven by application diversification and market needs, software systems are integrating new features rapidly. However, this “feature creep” can compromise software security, as …
J Cao, K Arya, R Garg, S Matott… - 2016 IEEE 22nd …, 2016 - ieeexplore.ieee.org
Fault tolerance for the upcoming exascale generation has long been an area of active research. One of the components of a fault tolerance strategy is checkpointing. Petascale …
Network-based deployments within the Internet of Things increasingly rely on the cloud- controlled federation of individual networks to configure, authorize, and manage devices …
Transparently checkpointing MPI for fault tolerance and load balancing is a long-standing problem in HPC. The problem has been complicated by the need to provide checkpoint …
Many massive data processing applications nowadays often need long, continuous, and uninterrupted data accesses. Distributed file systems are used as the back-end storage to …
MANA-2.0 is a scalable, future-proof design for transparent checkpointing of MPI-based computations. Its network transparency (“network-agnostic”) feature ensures that MANA-2.0 …
Typical devices of the Internet of Things are usually under-powered, and have limited RAM. This is due to energy and cost concerns. Yet, IoT applications require increasingly complex …