作者
Devashree Tripathy, Amirali Abdolrashidi, Laxmi Narayan Bhuyan, Liang Zhou, Daniel Wong
发表日期
2021/6/8
期刊
ACM Transactions on Architecture and Code Optimization (TACO)
卷号
18
期号
3
页码范围
1-26
出版商
ACM
简介
The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence, the data access locality of an application should be considered during thread scheduling to improve execution time and energy consumption. Recent works have tried to use the locality behavior of regular and structured applications in thread scheduling, but the difficult case of irregular and unstructured parallel applications remains to be explored.
We present PAVER, a Priority-Aware Vertex schedulER, which takes a graph-theoretic approach toward thread scheduling. We analyze the cache locality behavior among thread blocks (TBs) through a just-in-time compilation, and represent the problem using a graph representing the TBs and the locality among them. This graph is then partitioned to TB groups that display maximum data …
引用总数
学术搜索中的文章
D Tripathy, A Abdolrashidi, LN Bhuyan, L Zhou… - ACM Transactions on Architecture and Code …, 2021