Efficiently enabling conventional block sizes for very large die-stacked DRAM caches- 学术资源搜索

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

GH Loh, MD Hill - Proceedings of the 44th Annual IEEE/ACM …, 2011 - dl.acm.org

Proceedings of the 44th Annual IEEE/ACM International Symposium on …, 2011•dl.acm.org

Die-stacking technology enables multiple layers of DRAM to be integrated with multicore
processors. A promising use of stacked DRAM is as a cache, since its capacity is insufficient
to be all of main memory (for all but some embedded systems). However, a 1GB DRAM
cache with 64-byte blocks requires 96MB of tag storage. Placing these tags on-chip is
impractical (larger than on-chip L3s) while putting them in DRAM is slow (two full DRAM
accesses for tag and data). Larger blocks and sub-blocking are possible, but less robust due …

Die-stacking technology enables multiple layers of DRAM to be integrated with multicore processors. A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory (for all but some embedded systems). However, a 1GB DRAM cache with 64-byte blocks requires 96MB of tag storage. Placing these tags on-chip is impractical (larger than on-chip L3s) while putting them in DRAM is slow (two full DRAM accesses for tag and data). Larger blocks and sub-blocking are possible, but less robust due to fragmentation.

This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations. First, we make hits faster than just storing tags in stacked DRAM by scheduling the tag and data accesses as a compound access so the data access is always a row buffer hit. Second, we make misses faster with a MissMap that eschews stacked-DRAM access on all misses. Like extreme sub-blocking, our implementation of the MissMap stores a vector of block-valid bits for each "page" in the DRAM cache. Unlike conventional sub-blocking, the MissMap (a) points to many more pages than can be stored in the DRAM cache (making the effects of fragmentation rare) and (b) does not point to the "way" that holds a block (but defers to the off-chip tags).

For the evaluated large-footprint commercial workloads, the proposed cache organization delivers 92.9% of the performance benefit of an ideal 1GB DRAM cache with an impractical 96MB on-chip SRAM tag array.

ACM Digital Library

展开收起

被引用次数：323 相关文章所有 10 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

引用