Performance implications of global virtual time algorithms on a knights landing processor

A Eker, B Williams, N Mishra, D Thakur… - 2018 IEEE/ACM …, 2018 - ieeexplore.ieee.org
A Eker, B Williams, N Mishra, D Thakur, K Chiu, D Ponomarev, N Abu-Ghazaleh
2018 IEEE/ACM 22nd International Symposium on Distributed …, 2018ieeexplore.ieee.org
Recent studies investigated the performance of Parallel Discrete Event Simulation (PDES)
on Intel Xeon Phi manycore processors, but generally reported underwhelming performance
results, especially at high scales when all cores and thread contexts are fully loaded. While
the lack of scalability in an earlier study on a Knights Corner (KC) processor is an artifact of
physical limitations of the KC system, performance challenges on a Knights Landing (KNL)
system partially stem from a slower global virtual time (GVT) computation algorithm used in …
Recent studies investigated the performance of Parallel Discrete Event Simulation (PDES) on Intel Xeon Phi manycore processors, but generally reported underwhelming performance results, especially at high scales when all cores and thread contexts are fully loaded. While the lack of scalability in an earlier study on a Knights Corner (KC) processor is an artifact of physical limitations of the KC system, performance challenges on a Knights Landing (KNL) system partially stem from a slower global virtual time (GVT) computation algorithm used in that study. In this paper, we re-examine PDES performance on KNL under more efficient GVT algorithms to alleviate the GVT bottleneck. Specifically, we compare a synchronous GVT algorithm based on barrier synchronization, and two asynchronous GVT implementations: a modified Mattern's algorithm for shared memory systems and a recently-proposed wait-free algorithm. Using the ROSS simulator, we demonstrate that minimizing the GVT bottleneck results in significant improvement in scalability, allowing the simulation to scale with performance all the way to 250 threads (per chip). Interestingly, we observe that while for the balanced models the wait-free algorithm is a clear winner, barrier-based GVT provides significantly better results for imbalanced models executed at high scale. We also perform detailed simulation profiling to understand the underlying reasons for these performance trends.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果