Characterizing communication and page usage of parallel applications for thread and data mapping

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

Optimizing the bruck algorithm for non-uniform all-to-all communication

K Fan, T Gilray, V Pascucci, X Huang… - Proceedings of the 31st …, 2022 - dl.acm.org
In MPI, collective routines MPI_Alltoall and MPI_Alltoallv play an important role in facilitating
all-to-all inter-process data exchange. MPI_Alltoallv is a generalization of MPI_Alltoall …

kMAF: Automatic kernel-level management of thread and data affinity

M Diener, EHM Cruz, POA Navaux, A Busse… - Proceedings of the 23rd …, 2014 - dl.acm.org
One of the main challenges for parallel architectures is the increasing complexity of the
memory hierarchy, which consists of several levels of private and shared caches, as well as …

Hybrid MPI: efficient message passing for multi-core systems

A Friedley, G Bronevetsky, T Hoefler… - Proceedings of the …, 2013 - dl.acm.org
Multi-core shared memory architectures are ubiquitous in both High-Performance
Computing (HPC) and commodity systems because they provide an excellent trade-off …

HAN: A hierarchical autotuned collective communication framework

X Luo, W Wu, G Bosilca, Y Pei, Q Cao… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
High-performance computing (HPC) systems keep growing in scale and heterogeneity to
satisfy the increasing computational need, and this brings new challenges to the design of …

Parallel discrete event simulation for multi-core systems: Analysis and optimization

J Wang, D Jagtap, N Abu-Ghazaleh… - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
Parallel Discrete Event Simulation (PDES) can substantially improve the performance and
capacity of simulation, allowing the study of larger, more detailed models, in less time. PDES …

Freeflow: High performance container networking

T Yu, SA Noghabi, S Raindel, H Liu, J Padhye… - Proceedings of the 15th …, 2016 - dl.acm.org
With the tremendous popularity gained by container technology, many applications are
being containerized: splitting into numerous containers connected by networks. However …

Contention-aware kernel-assisted MPI collectives for multi-/many-core systems

S Chakraborty, H Subramoni… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Multi-/many-core CPU based architectures are seeing widespread adoption due to their
unprecedented compute performance in a small power envelope. With the increasingly …

Designing efficient shared address space reduction collectives for multi-/many-cores

JM Hashmi, S Chakraborty, M Bayatpour… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
State-of-the-art designs for the hierarchical reduction collective operation in MPI that work on
the concept of distributed address spaces incur the cost of intermediate copies inside the …

Optimizing MPI Collectives on Shared Memory Multi-Cores

J Peng, J Fang, J Liu, M Xie, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org
Message Passing Interface (MPI) programs often experience performance slowdowns due
to collective communication operations, like broadcasting and reductions. As modern CPUs …