Everything you always wanted to know about synchronization but were afraid to ask

T David, R Guerraoui, V Trigonakis - Proceedings of the Twenty-Fourth …, 2013 - dl.acm.org
This paper presents the most exhaustive study of synchronization to date. We span multiple
layers, from hardware cache-coherence protocols up to high-level concurrent software. We …

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

Remote core locking: Migrating {Critical-Section} execution to improve the performance of multithreaded applications

JP Lozi, F David, G Thomas, J Lawall… - 2012 USENIX Annual …, 2012 - usenix.org
The scalability of multithreaded applications on current multicore systems is hampered by
the performance of lock algorithms, due to the costs of access contention and cache misses …

Ffwd: Delegation is (much) faster than you think

S Roghanchi, J Eriksson, N Basu - … of the 26th Symposium on Operating …, 2017 - dl.acm.org
We revisit the question of delegation vs. synchronized access to shared memory, and show
through analysis and demonstration that delegation can be much faster than locking under a …

WiSync: An architecture for fast synchronization through on-chip wireless communication

S Abadal, A Cabellos-Aparicio, E Alarcon… - ACM SIGPLAN …, 2016 - dl.acm.org
In shared-memory multiprocessing, fine-grain synchronization is challenging because it
requires frequent communication. As technology scaling delivers larger manycore chips …

Adaptive contention management for fine-grained synchronization on commodity GPUs

L Gao, J Wang, W Zhang - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org
As more emerging applications are moving to GPUs, fine-grained synchronization has
become imperative. However, their performance can be severely impaired in case of …

Fast and portable locking for multicore architectures

JP Lozi, F David, G Thomas, J Lawall… - ACM Transactions on …, 2016 - dl.acm.org
The scalability of multithreaded applications on current multicore systems is hampered by
the performance of lock algorithms, due to the costs of access contention and cache misses …

Efficient hardware barrier synchronization in many-core cmps

JL Abellán, J Fernández… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Traditional software-based barrier implementations for shared memory parallel machines
tend to produce hotspots in terms of memory and network contention as the number of …

MiSAR: Minimalistic synchronization accelerator with resource overflow management

CK Liang, M Prvulovic - ACM SIGARCH Computer Architecture News, 2015 - dl.acm.org
While numerous hardware synchronization mechanisms have been proposed, they either
no longer function or suffer great performance loss when their hardware resources are …

Scalable adaptive NUMA-aware lock

M Zhang, H Chen, L Cheng, FCM Lau… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Scalable locking is a key building block for scalable multi-threaded software. Its performance
is especially critical in multi-socket, multi-core machines with non-uniform memory access …