P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions

D Pekurovsky - SIAM Journal on Scientific Computing, 2012 - SIAM
Fourier and related transforms are a family of algorithms widely employed in diverse areas
of computational science, notoriously difficult to scale on high-performance parallel …

Bluesmpi: Efficient mpi non-blocking alltoall offloading designs on modern bluefield smart nics

M Bayatpour, N Sarkauskas, H Subramoni… - … Conference on High …, 2021 - Springer
In the state-of-the-art production quality MPI (Message Passing Interface) libraries,
communication progress is either performed by the main thread or a separate …

Using run-time reconfiguration for fault injection in hardware prototypes

L Antoni, R Leveugle, M Feher - 17th IEEE International …, 2002 - ieeexplore.ieee.org
In this paper, a new methodology for the injection of single event upsets (SEU) in memory
elements is introduced. SEUs in memory elements can occur due to many reasons (eg …

AccFFT: A library for distributed-memory FFT on CPU and GPU architectures

A Gholami, J Hill, D Malhotra, G Biros - arXiv preprint arXiv:1506.07933, 2015 - arxiv.org
We present a new library for parallel distributed Fast Fourier Transforms (FFT). The
importance of FFT in science and engineering and the advances in high performance …

Efficient design for MPI asynchronous progress without dedicated resources

A Ruhela, H Subramoni, S Chakraborty, M Bayatpour… - Parallel Computing, 2019 - Elsevier
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

Scalable reduction collectives with data partitioning-based multi-leader design

M Bayatpour, S Chakraborty, H Subramoni… - Proceedings of the …, 2017 - dl.acm.org
Existing designs for MPI_Allreduce do not take advantage of the vast parallelism available in
modern multi-/many-core processors like Intel Xeon/Xeon Phis or the increases in …

[PDF][PDF] The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC

DK Panda, K Tomko, K Schulz… - … with Int'l …, 2013 - pfigshare-u-files.s3.amazonaws.com
I. OVERVIEW OF THE MVAPICH PROJECT The MVAPICH (for MPI-1) and MVAPICH2 (for
MPI-2 and MPI-3) open-source libraries [?] have been designed and developed during the …

Efficient asynchronous communication progress for MPI without dedicated resources

A Ruhela, H Subramoni, S Chakraborty… - Proceedings of the 25th …, 2018 - dl.acm.org
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models

W Wang, Z Lai, S Li, W Liu, K Ge, Y Liu… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-
large size with negligible increases in computation. The MoE model has achieved the …

[PDF][PDF] Interim report on benchmarking FFT libraries on high performance systems

A Ayala, S Tomov, P Luszczek, S Cayrols… - … of Tennessee, ICL …, 2021 - icl.utk.edu
Abstract The Fast Fourier Transform (FFT) is used in many applications such as molecular
dynamics, spectrum estimation, fast convolution and correlation, signal modulation, and …