Crac: Checkpoint-restart architecture for cuda with streams and uvm

T Jain, G Cooperman - SC20: International Conference for High …, 2020 - ieeexplore.ieee.org
The share of the top 500 supercomputers with NVIDIA GPUs is now over 25% and continues
to grow. While fault tolerance is a critical issue for supercomputing, there does not currently …

Contention-aware kernel-assisted MPI collectives for multi-/many-core systems

S Chakraborty, H Subramoni… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Multi-/many-core CPU based architectures are seeing widespread adoption due to their
unprecedented compute performance in a small power envelope. With the increasingly …

Designing efficient shared address space reduction collectives for multi-/many-cores

JM Hashmi, S Chakraborty, M Bayatpour… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
State-of-the-art designs for the hierarchical reduction collective operation in MPI that work on
the concept of distributed address spaces incur the cost of intermediate copies inside the …

Optimizing MPI Collectives on Shared Memory Multi-Cores

J Peng, J Fang, J Liu, M Xie, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org
Message Passing Interface (MPI) programs often experience performance slowdowns due
to collective communication operations, like broadcasting and reductions. As modern CPUs …

Gait analysis for human identification in frequency domain

S Yu, L Wang, W Hu, T Tan - … on Image and Graphics (ICIG'04), 2004 - ieeexplore.ieee.org
In this paper, we analyze the spatio-temporal human characteristic of moving silhouettes in
frequency domain, and find key Fourier descriptors that have better discriminatory capability …

Scalable mpi collectives using sharp: Large scale performance evaluation on the tacc frontera system

B Ramesh, KK Suresh, N Sarkauskas… - 2020 Workshop on …, 2020 - ieeexplore.ieee.org
The Message-Passing Interface (MPI) is the de-facto standard for designing and executing
applications on massively parallel hardware. MPI collectives provide a convenient …

Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memory

S White, LV Kale - Concurrency and Computation: Practice and …, 2020 - Wiley Online Library
Adaptive MPI is an implementation of the MPI standard that supports the virtualization of
ranks as user‐level threads, rather than OS processes. In this work, we optimize the …

CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication

K Ouyang, M Si, A Hori, Z Chen… - … Conference for High …, 2020 - ieeexplore.ieee.org
Load balance is essential for high-performance applications. Unbalanced communication
can cause severe performance degradation, even in computation-balanced BSP …

Cooperative rendezvous protocols for improved performance and overlap

S Chakraborty, M Bayatpour, J Hashmi… - … Conference for High …, 2018 - ieeexplore.ieee.org
With the emergence of larger multi-/many-core clusters and new areas of HPC applications,
performance of large message communication is becoming more important. MPI libraries …

Performance comparison of cross memory attach capable MPI vs. multithreaded optimistic parallel simulations

DM Rao - Proceedings of the 2018 ACM SIGSIM Conference on …, 2018 - dl.acm.org
The growth in many-core CPUs has motivated development of shared-memory,
multithreaded solutions to minimize communication and synchronization overheads in …