Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis

F Broquedis, J Clet-Ortega, S Moreaud… - 2010 18th Euromicro …, 2010 - ieeexplore.ieee.org

The increasing numbers of cores, shared caches and memory nodes within machines
introduces a complex hardware topology. High-performance computing applications now …

被引用次数：606 相关文章所有 19 个版本

[PDF] ohio-state.edu

Optimizing MPI communication on multi-GPU systems using CUDA inter-process communication

S Potluri, H Wang, D Bureddy, AK Singh… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

Many modern clusters are being equipped with multiple GPUs per node to achieve better
compute density and power efficiency. However, moving data in/out of GPUs continues to …

被引用次数：112 相关文章所有 15 个版本

[PDF] yiwendong.com

On the Caching Schemes to Speed Up Program Reduction

Y Tian, X Zhang, Y Dong, Z Xu, M Zhang… - ACM Transactions on …, 2023 - dl.acm.org

Program reduction is a highly practical, widely demanded technique to help debug
language tools, such as compilers, interpreters and debuggers. Given a program P that …

被引用次数：6 相关文章所有 2 个版本

[PDF] hal.science

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier

The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

被引用次数：111 相关文章所有 12 个版本

[PDF] ict.ac.cn

High performance interconnect network for Tianhe system

XK Liao, ZB Pang, KF Wang, YT Lu, M Xie, J Xia… - Journal of Computer …, 2015 - Springer

In this paper, we present the Tianhe-2 interconnect network and message passing services.
We describe the architecture of the router and network interface chips, and highlight a set of …

被引用次数：89 相关文章所有 8 个版本

[PDF] unl.edu

FlexIO: I/O middleware for location-flexible scientific data analytics

F Zheng, H Zou, G Eisenhauer… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Increasingly severe I/O bottlenecks on High-End Computing machines are prompting
scientists to process simulation output data online while simulations are running and before …

被引用次数：105 相关文章所有 10 个版本

[PDF] intel.com

Framework for scalable intra-node collective operations using shared memory

S Jain, R Kaleem, MG Balmana… - … Conference for High …, 2018 - ieeexplore.ieee.org

Collective operations are used in MPI programs to express common communication
patterns, collective computations, or synchronization. In many collectives, such as …

被引用次数：38 相关文章所有 6 个版本

[PDF] kernel.org

[PDF][PDF] Popcorn: a replicated-kernel OS based on Linux

A Barbalace, B Ravindran, D Katz - Proceedings of the Linux Symposium …, 2014 - kernel.org

In recent years, the number of CPUs per platform has continuously increased, affecting
almost all segments of the computer market. Because of this trend, many researchers have …

被引用次数：58 相关文章所有 2 个版本

[PDF] researchgate.net

The TH Express high performance interconnect networks

Z Pang, M Xie, J Zhang, Y Zheng, G Wang… - Frontiers of Computer …, 2014 - Springer

Interconnection network plays an important role in scalable high performance computer
(HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to …

被引用次数：58 相关文章所有 7 个版本

[PDF] hal.science

Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus

T Ma, G Bosilca, A Bouteiller, B Goglin… - 2011 International …, 2011 - ieeexplore.ieee.org

Shared memory is among the most common approaches to implementing message passing
within multicorenodes. However, current shared memory techniques donot scale with …

被引用次数：61 相关文章所有 10 个版本

高级搜索

QQ 群