Multicore cache coherence control by a parallelizing compiler

H Kasahara, K Kimura, BA Adhi… - 2017 IEEE 41st …, 2017 - ieeexplore.ieee.org
H Kasahara, K Kimura, BA Adhi, Y Hosokawa, Y Kishimoto, M Mase
2017 IEEE 41st annual computer software and applications …, 2017ieeexplore.ieee.org
A recent development in multicore technology has enabled development of hundreds or
thousands core processor. However, on such multicore processor, an efficient hardware
cache coherence scheme will become very complex and expensive to develop. This paper
proposes a parallelizing compiler directed software coherence scheme for shared memory
multicore systems without hardware cache coherence control. The general idea of the
proposed method is that an automatic parallelizing compiler analyzes the control …
A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 "equake", 2.9 times for SPEC2006 "lbm", 3.34 times for NPB "cg", and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for "equake", 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for "MPEG2 Encoder".
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果