Performance-centric register file design for GPUs using racetrack memory- 学术资源搜索

Performance-centric register file design for GPUs using racetrack memory

S Wang, Y Liang, C Zhang, X Xie, G Sun… - 2016 21st Asia and …, 2016 - ieeexplore.ieee.org

S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li

2016 21st Asia and South Pacific Design Automation Conference (ASP …, 2016•ieeexplore.ieee.org

The key to high performance for GPU architecture lies in massive threading to drive the large number of cores and enable overlapping of threading execution. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file costs so large amount of chip area that it cannot scale to meet the increasing demand of massive threading for GPU applications. Racetrack memory is a promising technology for designing large capacity register file on GPUs due to its high data storage density. However, without careful deployment of registers, the lengthy shift operation of racetrack memory may hurt the performance. In this paper, we explore racetrack memory for designing high performance register file for GPU architecture. High storage density racetrack memory helps to improve the thread level parallelism, i.e., the number of threads that simultaneously execute. However, if the bits of the registers are not aligned to the ports, shift operations are required to move the bits to the ports. To mitigate the shift operation overhead problem, we develop a register file preshifting strategy and a compile-time managed register mapping algorithm. Experimental results demonstrate that our technique achieves up to 24% (19% on average) improvement in performance for a variety of GPU applications.

ieeexplore.ieee.org

展开收起

被引用次数：20 相关文章所有 5 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Performance-centric register file design for GPUs using racetrack memory

引用