Everything you always wanted to know about synchronization but were afraid to ask
This paper presents the most exhaustive study of synchronization to date. We span multiple
layers, from hardware cache-coherence protocols up to high-level concurrent software. We …
layers, from hardware cache-coherence protocols up to high-level concurrent software. We …
Syncron: Efficient synchronization support for near-data-processing architectures
C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …
movement costs and can provide significant performance and energy benefits to parallel …
Remote core locking: Migrating {Critical-Section} execution to improve the performance of multithreaded applications
The scalability of multithreaded applications on current multicore systems is hampered by
the performance of lock algorithms, due to the costs of access contention and cache misses …
the performance of lock algorithms, due to the costs of access contention and cache misses …
Ffwd: Delegation is (much) faster than you think
S Roghanchi, J Eriksson, N Basu - … of the 26th Symposium on Operating …, 2017 - dl.acm.org
We revisit the question of delegation vs. synchronized access to shared memory, and show
through analysis and demonstration that delegation can be much faster than locking under a …
through analysis and demonstration that delegation can be much faster than locking under a …
WiSync: An architecture for fast synchronization through on-chip wireless communication
In shared-memory multiprocessing, fine-grain synchronization is challenging because it
requires frequent communication. As technology scaling delivers larger manycore chips …
requires frequent communication. As technology scaling delivers larger manycore chips …
Adaptive contention management for fine-grained synchronization on commodity GPUs
L Gao, J Wang, W Zhang - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org
As more emerging applications are moving to GPUs, fine-grained synchronization has
become imperative. However, their performance can be severely impaired in case of …
become imperative. However, their performance can be severely impaired in case of …
Fast and portable locking for multicore architectures
The scalability of multithreaded applications on current multicore systems is hampered by
the performance of lock algorithms, due to the costs of access contention and cache misses …
the performance of lock algorithms, due to the costs of access contention and cache misses …
Efficient hardware barrier synchronization in many-core cmps
JL Abellán, J Fernández… - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Traditional software-based barrier implementations for shared memory parallel machines
tend to produce hotspots in terms of memory and network contention as the number of …
tend to produce hotspots in terms of memory and network contention as the number of …
MiSAR: Minimalistic synchronization accelerator with resource overflow management
CK Liang, M Prvulovic - ACM SIGARCH Computer Architecture News, 2015 - dl.acm.org
While numerous hardware synchronization mechanisms have been proposed, they either
no longer function or suffer great performance loss when their hardware resources are …
no longer function or suffer great performance loss when their hardware resources are …
Scalable adaptive NUMA-aware lock
Scalable locking is a key building block for scalable multi-threaded software. Its performance
is especially critical in multi-socket, multi-core machines with non-uniform memory access …
is especially critical in multi-socket, multi-core machines with non-uniform memory access …