CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …
Modern graphics processing units (GPUs) leverage a high degree of thread-level
parallelism, necessitating large-sized register files for storing numerous thread contexts. To …
Graphics processing units (GPUs) achieve high throughput by exploiting a high degree of
thread-level parallelism (TLP). To support such high TLP, GPUs have a large-sized register …
Recent GPUs provisioned with large register files (RFs) cannot fully utilize the bandwidth
between the RFs and execution pipelines, as the current policy for allocating operand (OP) …
[引用][C] TEA-RC: Thread Context-Aware Register Cache for GPUs
MKUK YOON