In MPI, collective routines MPI_Alltoall and MPI_Alltoallv play an important role in facilitating all-to-all inter-process data exchange. MPI_Alltoallv is a generalization of MPI_Alltoall …
One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as …
Multi-core shared memory architectures are ubiquitous in both High-Performance Computing (HPC) and commodity systems because they provide an excellent trade-off …
High-performance computing (HPC) systems keep growing in scale and heterogeneity to satisfy the increasing computational need, and this brings new challenges to the design of …
J Wang, D Jagtap, N Abu-Ghazaleh… - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
Parallel Discrete Event Simulation (PDES) can substantially improve the performance and capacity of simulation, allowing the study of larger, more detailed models, in less time. PDES …
With the tremendous popularity gained by container technology, many applications are being containerized: splitting into numerous containers connected by networks. However …
Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly …
State-of-the-art designs for the hierarchical reduction collective operation in MPI that work on the concept of distributed address spaces incur the cost of intermediate copies inside the …
J Peng, J Fang, J Liu, M Xie, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org
Message Passing Interface (MPI) programs often experience performance slowdowns due to collective communication operations, like broadcasting and reductions. As modern CPUs …