The massive parallelism present in GPUs comes at the cost of reduced L1 and L2 cache sizes per thread, leading to serious cache contention problems such as thrashing. Hence …
X Xing, J Ji, Y Yao - 2018 IEEE international conference on …, 2018 - ieeexplore.ieee.org
Human brain network analysis based on machine learning has been paid much attention in the field of neuroimaging, where the application of convolutional neural network (CNN) is …
A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …
The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects …
S Darabi, N Mahani, H Baxishi… - Proceedings of the …, 2022 - dl.acm.org
Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (eg, spatial multitasking) have …
Dynamic neural networks enable higher representation flexibility compared to networks with a fixed architecture and are extensively deployed in problems dealing with varying input …
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for applications with massively data-parallel tasks. However, recent studies show that GPUs …
D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org
A generally used GPU programming methodology is that adjacent threads access data in neighbor or specific-stride memory addresses and perform computations with the fetched …
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …