Glocks: Efficient support for highly-contended locks in many-core cmps

JL Abell, J Fern, ME Acacio - 2011 IEEE International Parallel …, 2011 - ieeexplore.ieee.org
JL Abell, J Fern, ME Acacio
2011 IEEE International Parallel & Distributed Processing Symposium, 2011ieeexplore.ieee.org
Synchronization is of paramount importance to exploit thread-level parallelism on many-core
CMPs. In these architectures, synchronization mechanisms usually rely on shared variables
to coordinate multithreaded access to shared data structures thus avoiding data
dependency conflicts. Lock synchronization is known to be a key limitation to performance
and scalability. On the one hand, lock acquisition through busy waiting on shared variables
generates additional coherence activity which interferes with applications. On the other …
Synchronization is of paramount importance to exploit thread-level parallelism on many-core CMPs. In these architectures, synchronization mechanisms usually rely on shared variables to coordinate multithreaded access to shared data structures thus avoiding data dependency conflicts. Lock synchronization is known to be a key limitation to performance and scalability. On the one hand, lock acquisition through busy waiting on shared variables generates additional coherence activity which interferes with applications. On the other hand, lock contention causes serialization which results in performance degradation. This paper proposes and evaluates \textit{GLocks}, a hardware-supported implementation for highly-contended locks in the context of many-core CMPs. \textit{GLocks} use a token-based message-passing protocol over a dedicated network built on state-of-the-art technology. This approach skips the memory hierarchy to provide a non-intrusive, extremely efficient and fair lock implementation with negligible impact on energy consumption or die area. A comprehensive comparison against the most efficient shared-memory-based lock implementation for a set of micro benchmarks and real applications quantifies the goodness of \textit{GLocks}. Performance results show an average reduction of 42% and 14% in execution time, an average reduction of 76% and 23% in network traffic, and also an average reduction of 78% and 28% in energy-delay product (EDP) metric for the full CMP for the micro benchmarks and the real applications, respectively. In light of our performance results, we can conclude that \textit{GLocks} satisfy our initial working hypothesis. \textit{GLocks} minimize cache-coherence network traffic due to lock synchronization which translates into reduced power consumption and execution time.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果