Understanding the impact of multi-core architecture in cluster computing: A case study with intel dual-core system

L Chai, Q Gao, DK Panda - … on cluster computing and the grid …, 2007 - ieeexplore.ieee.org
Multi-core processors are growing as a new industry trend as single core processors rapidly
reach the physical limits of possible complexity and speed. In the new Top500 …

PAMI: A parallel active message interface for the Blue Gene/Q supercomputer

S Kumar, AR Mamidala, DA Faraj… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The Blue Gene/Q machine is the next generation in the line of IBM massively parallel
supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each …

Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem

D Buntinas, G Mercier, W Gropp - Sixth IEEE International …, 2006 - ieeexplore.ieee.org
This paper presents a new low-level communication subsystem called Nemesis. Nemesis
has been designed and implemented to be scalable and efficient both in the intranode …

KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

Designing high performance and scalable MPI intra-node communication support for clusters

L Chai, A Hartono, DK Panda - 2006 IEEE International …, 2006 - ieeexplore.ieee.org
As new processor and memory architectures advance, clusters start to be built from larger
SMP systems, which makes MPI intra-node communication a critical issue in high …

Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis

D Buntinas, B Goglin, D Goodell… - 2009 International …, 2009 - ieeexplore.ieee.org
The emergence of multicore processors raises the need to efficiently transfer large amounts
of data between local processes. MPICH2 is a highly portable MPI implementation whose …

Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

D Buntinas, G Mercier, W Gropp - Parallel Computing, 2007 - Elsevier
This paper presents the implementation of MPICH2 over the Nemesis communication
subsystem and the evaluation of its shared-memory performance. We describe design …

SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor

R Brightwell, K Pedretti… - SC'08: Proceedings of the …, 2008 - ieeexplore.ieee.org
This paper describes SMARTMAP, an operating system technique that implements fixed
offset virtual memory addressing. SMARTMAP allows the application processes on a multi …

Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems

HW Jin, S Sur, L Chai, DK Panda - 2007 IEEE International …, 2007 - ieeexplore.ieee.org
Modern processors have multiple cores on a chip to overcome power consumption and heat
dissipation issues. As more and more compute cores become available on a single node, it …

Benefits of cross memory attach for mpi libraries on hpc clusters

J Vienne - Proceedings of the 2014 Annual Conference on …, 2014 - dl.acm.org
With the number of cores per node increasing in modern clusters, an efficient
implementation of intra-node communications is critical for application performance. MPI …