A case for standard non-blocking collective operations

S Li, T Hoefler - Proceedings of the International Conference for High …, 2021 - dl.acm.org

Training large deep learning models at scale is very challenging. This paper proposes
Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for …

被引用次数：129 相关文章所有 25 个版本

An overview of MPI characteristics of exascale proxy applications

B Klenk, H Fröning - … Computing: 32nd International Conference, ISC High …, 2017 - Springer

The scale of applications and computing systems is tremendously increasing and needs to
increase even more to realize exascale systems. As the number of nodes keeps growing …

被引用次数：56 相关文章所有 6 个版本

[PDF] arxiv.org

Distributed quantum computing with QMPI

T Häner, DS Steiger, T Hoefler, M Troyer - Proceedings of the …, 2021 - dl.acm.org

Practical applications of quantum computers require millions of physical qubits and it will be
challenging for individual quantum processors to reach such qubit numbers. It is therefore …

被引用次数：49 相关文章所有 24 个版本

[PDF] uantwerpen.be

Hiding global communication latency in the GMRES algorithm on massively parallel machines

P Ghysels, TJ Ashby, K Meerbergen… - SIAM journal on scientific …, 2013 - SIAM

In the generalized minimal residual method (GMRES), the global all-to-all communication
required in each iteration for orthogonalization and normalization of the Krylov base vectors …

被引用次数：169 相关文章所有 11 个版本

[PDF] ethz.ch

Message progression in parallel computing-to thread or not to thread?

T Hoefler, A Lumsdaine - 2008 IEEE International Conference …, 2008 - ieeexplore.ieee.org

Message progression schemes that enable communication and computation to be
overlapped have the potential to improve the performance of parallel applications. With …

被引用次数：151 相关文章所有 30 个版本

[PDF] unixer.de

Towards efficient mapreduce using mpi

T Hoefler, A Lumsdaine, J Dongarra - European Parallel Virtual Machine …, 2009 - Springer

MapReduce is an emerging programming paradigm for data-parallel applications. We
discuss common strategies to implement a MapReduce runtime and propose an optimized …

被引用次数：126 相关文章所有 38 个版本

[PDF] arxiv.org

Mitigating network noise on dragonfly networks through application-aware routing

D De Sensi, S Di Girolamo, T Hoefler - Proceedings of the International …, 2019 - dl.acm.org

System noise can negatively impact the performance of HPC systems, and the
interconnection network is one of the main factors contributing to this problem. To mitigate …

被引用次数：41 相关文章所有 26 个版本

[PDF] manchester.ac.uk

Performance analysis of asynchronous Jacobi's method implemented in MPI, SHMEM and OpenMP

I Bethune, JM Bull, NJ Dingle… - … International Journal of …, 2014 - journals.sagepub.com

Ever-increasing core counts create the need to develop parallel algorithms that avoid
closely coupled execution across all cores. We present performance analysis of several …

被引用次数：52 相关文章所有 8 个版本

[PDF] acm.org

Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes

JL Träff, I Vardas - Proceedings of the 30th European MPI Users' Group …, 2023 - dl.acm.org

A major design objective of MPI is to enable support for the construction of safe parallel
libraries that can be used and mixed freely in complex applications. In this respect, MPI has …

被引用次数：3 相关文章所有 3 个版本

[PDF] technion.ac.il

Distributed adaptive routing convergence to non-blocking DCN routing assignments

E Zahavi, I Keslassy, A Kolodny - IEEE Journal on Selected …, 2013 - ieeexplore.ieee.org

With the growing popularity of big-data applications, Data Center Networks (DCN)
increasingly carry larger and longer traffic flows. As a result of this increased flow granularity …

被引用次数：31 相关文章所有 5 个版本

高级搜索

QQ 群