M Khairy, Z Shen, TM Aamodt… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In computer architecture, significant innovation frequently comes from industry. However, the simulation tools used by industry are often not released for open use, and even when they …
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off- chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …
Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers …
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level …
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
Z Wang, J Yang, R Melhem, B Childers… - … symposium on high …, 2016 - ieeexplore.ieee.org
Studies show that non-graphics programs can be less optimized for the GPU hardware, leading to significant resource under-utilization. Sharing the GPU among multiple programs …
Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a …
M Lee, S Song, J Moon, J Kim, W Seo… - 2014 IEEE 20th …, 2014 - ieeexplore.ieee.org
High performance in GPGPU workloads is obtained by maximizing parallelism and fully utilizing the available resources. The thousands of threads are assigned to each core in …
In this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …