Characterizing communication and page usage of parallel applications for thread and data mapping

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis

D Buntinas, B Goglin, D Goodell… - 2009 International …, 2009 - ieeexplore.ieee.org
The emergence of multicore processors raises the need to efficiently transfer large amounts
of data between local processes. MPICH2 is a highly portable MPI implementation whose …

MT-MPI: Multithreaded MPI for many-core environments

M Si, AJ Peña, P Balaji, M Takagi… - Proceedings of the 28th …, 2014 - dl.acm.org
Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds
of hardware threads. To utilize such architectures, application programmers are increasingly …

Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI protocols

D Buntinas, C Coti, T Herault, P Lemarinier… - Future Generation …, 2008 - Elsevier
A long-term trend in high-performance computing is the increasing number of nodes in
parallel computing platforms, which entails a higher failure probability. Fault tolerant …

SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor

R Brightwell, K Pedretti… - SC'08: Proceedings of the …, 2008 - ieeexplore.ieee.org
This paper describes SMARTMAP, an operating system technique that implements fixed
offset virtual memory addressing. SMARTMAP allows the application processes on a multi …

Process-in-process: techniques for practical address-space sharing

A Hori, M Si, B Gerofi, M Takagi, J Dayal… - Proceedings of the 27th …, 2018 - dl.acm.org
The two most common parallel execution models for many-core CPUs today are
multiprocess (eg, MPI) and multithread (eg, OpenMP). The multiprocess model allows each …

Efficient shared memory message passing for inter-VM communications

F Diakhaté, M Perache, R Namyst… - Euro-Par 2008 Workshops …, 2009 - Springer
Thanks to recent advances in virtualization technologies, it is now possible to benefit from
the flexibility brought by virtual machines at little cost in terms of CPU performance. However …

RCKMPI–lightweight MPI implementation for Intel's Single-chip Cloud Computer (SCC)

IA Comprés Ureña, M Riepen, M Konow - Recent Advances in the …, 2011 - Springer
Abstract The Single-chip Cloud Computer (SCC) is an experimental processor created by
Intel Labs. It is a distributed memory architecture that provides shared memory possibilities …

Processor affinity and MPI performance on SMP-CMP clusters

C Zhang, X Yuan, A Srinivasan - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
Clusters of Symmetric MultiProcessing (SMP) nodes with multi-core Chip-Multiprocessors
(CMP), also known as SMP-CMP clusters, are becoming ubiquitous today. For Message …

Communication-aware process and thread mapping using online communication detection

M Diener, EHM Cruz, POA Navaux, A Busse, HU Heiß - Parallel Computing, 2015 - Elsevier
The rising complexity of memory hierarchies and interconnections in parallel shared
memory architectures leads to differences in the communication performance. These …