[图书][B] The art of multiprocessor programming

M Herlihy, N Shavit, V Luchangco, M Spear - 2020 - books.google.com
The Art of Multiprocessor Programming, Second Edition, provides users with an authoritative
guide to multicore programming. This updated edition introduces higher level software …

Algorithms for scalable synchronization on shared-memory multiprocessors

JM Mellor-Crummey, ML Scott - ACM Transactions on Computer …, 1991 - dl.acm.org
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in
shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting …

Optimization of collective communication operations in MPICH

R Thakur, R Rabenseifner… - The International Journal …, 2005 - journals.sagepub.com
We describe our work on improving the performance of collective communication operations
in MPICH for clusters connected by switched networks. For each collective operation, we …

Memory coherence in shared virtual memory systems

K Li, P Hudak - ACM Transactions on Computer Systems (TOCS), 1989 - dl.acm.org
The memory coherence problem in designing and implementing a shared virtual memory on
loosely coupled multiprocessors is studied in depth. Two classes of algorithms, centralized …

[图书][B] Program synthesis by sketching

A Solar-Lezama - 2008 - search.proquest.com
The goal of software synthesis is to generate programs automatically from high-level
specifications. However, efficient implementations for challenging programs require a …

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

Stateful Serverless Computing with Crucial

D Barcelona-Pons, P Sutra… - ACM Transactions on …, 2022 - dl.acm.org
Serverless computing greatly simplifies the use of cloud resources. In particular, Function-as-
a-Service (FaaS) platforms enable programmers to develop applications as individual …

Implementation and performance of Munin

JB Carter, JK Bennett, W Zwaenepoel - ACM SIGOPS Operating …, 1991 - dl.acm.org
Munin is a distributed shared memory (DSM) system that allows shared memory parallel
programs to be executed efficiently on distributed memory multiprocessors. Munin is unique …

SparCML: High-performance sparse communication for machine learning

C Renggli, S Ashkboos, M Aghagolzadeh… - Proceedings of the …, 2019 - dl.acm.org
Applying machine learning techniques to the quickly growing data in science and industry
requires highly-scalable algorithms. Large datasets are most commonly processed" data …

Near-optimal sparse allreduce for distributed deep learning

S Li, T Hoefler - Proceedings of the 27th ACM SIGPLAN Symposium on …, 2022 - dl.acm.org
Communication overhead is one of the major obstacles to train large deep learning models
at scale. Gradient sparsification is a promising technique to reduce the communication …