KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier

The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

被引用次数：78 相关文章所有 6 个版本

[PDF] acm.org

Optimizing the bruck algorithm for non-uniform all-to-all communication

K Fan, T Gilray, V Pascucci, X Huang… - Proceedings of the 31st …, 2022 - dl.acm.org

In MPI, collective routines MPI_Alltoall and MPI_Alltoallv play an important role in facilitating
all-to-all inter-process data exchange. MPI_Alltoallv is a generalization of MPI_Alltoall …

被引用次数：16 相关文章所有 5 个版本

[PDF] academia.edu

kMAF: Automatic kernel-level management of thread and data affinity

M Diener, EHM Cruz, POA Navaux, A Busse… - Proceedings of the 23rd …, 2014 - dl.acm.org

One of the main challenges for parallel architectures is the increasing complexity of the
memory hierarchy, which consists of several levels of private and shared caches, as well as …

被引用次数：58 相关文章所有 8 个版本

[PDF] ethz.ch

Hybrid MPI: efficient message passing for multi-core systems

A Friedley, G Bronevetsky, T Hoefler… - Proceedings of the …, 2013 - dl.acm.org

Multi-core shared memory architectures are ubiquitous in both High-Performance
Computing (HPC) and commodity systems because they provide an excellent trade-off …

被引用次数：65 相关文章所有 24 个版本

[PDF] nsf.gov

HAN: A hierarchical autotuned collective communication framework

X Luo, W Wu, G Bosilca, Y Pei, Q Cao… - 2020 IEEE …, 2020 - ieeexplore.ieee.org

High-performance computing (HPC) systems keep growing in scale and heterogeneity to
satisfy the increasing computational need, and this brings new challenges to the design of …

被引用次数：25 相关文章所有 8 个版本

Parallel discrete event simulation for multi-core systems: Analysis and optimization

J Wang, D Jagtap, N Abu-Ghazaleh… - IEEE Transactions on …, 2013 - ieeexplore.ieee.org

Parallel Discrete Event Simulation (PDES) can substantially improve the performance and
capacity of simulation, allowing the study of larger, more detailed models, in less time. PDES …

被引用次数：60 相关文章所有 5 个版本

[PDF] microsoft.com

Freeflow: High performance container networking

T Yu, SA Noghabi, S Raindel, H Liu, J Padhye… - Proceedings of the 15th …, 2016 - dl.acm.org

With the tremendous popularity gained by container technology, many applications are
being containerized: splitting into numerous containers connected by networks. However …

被引用次数：39 相关文章所有 2 个版本

[PDF] souravc.com

Contention-aware kernel-assisted MPI collectives for multi-/many-core systems

S Chakraborty, H Subramoni… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Multi-/many-core CPU based architectures are seeing widespread adoption due to their
unprecedented compute performance in a small power envelope. With the increasingly …

被引用次数：32 相关文章所有 4 个版本

[PDF] github.io

Designing efficient shared address space reduction collectives for multi-/many-cores

JM Hashmi, S Chakraborty, M Bayatpour… - 2018 IEEE …, 2018 - ieeexplore.ieee.org

State-of-the-art designs for the hierarchical reduction collective operation in MPI that work on
the concept of distributed address spaces incur the cost of intermediate copies inside the …

被引用次数：30 相关文章所有 3 个版本

[PDF] whiterose.ac.uk

Optimizing MPI Collectives on Shared Memory Multi-Cores

J Peng, J Fang, J Liu, M Xie, Y Dai, B Yang… - Proceedings of the …, 2023 - dl.acm.org

Message Passing Interface (MPI) programs often experience performance slowdowns due
to collective communication operations, like broadcasting and reductions. As modern CPUs …

被引用次数：4 相关文章所有 4 个版本

高级搜索

QQ 群