Performance optimization on big. little architectures: A memory-latency aware approach

W Wolff, B Porter - The 21st ACM SIGPLAN/SIGBED Conference on …, 2020 - dl.acm.org
W Wolff, B Porter
The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools …, 2020dl.acm.org
The energy demands of modern mobile devices have driven a trend towards heterogeneous
multi-core systems which include various types of core tuned for performance or energy
efficiency, offering a rich optimization space for software. On such systems, data coherency
between cores is automatically ensured by an interconnect between processors. On some
chip designs the performance of this interconnect, and by extension of the entire CPU
cluster, is highly dependent on the software's memory access characteristics and on the set …
The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果