Grus: Toward unified-memory-efficient high-performance graph processing on gpu

P Wang, J Wang, C Li, J Wang, H Zhu… - ACM Transactions on …, 2021 - dl.acm.org
Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …

Cairo: A compiler-assisted technique for enabling instruction-level offloading of processing-in-memory

R Hadidi, L Nai, H Kim, H Kim - ACM Transactions on Architecture and …, 2017 - dl.acm.org
Three-dimensional (3D)-stacking technology and the memory-wall problem have
popularized processing-in-memory (PIM) concepts again, which offers the benefits of …

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

KL Spafford, JS Meredith, S Lee, D Li, PC Roth… - Proceedings of the 9th …, 2012 - dl.acm.org
With the rise of general purpose computing on graphics processing units (GPGPU), the
influence from consumer markets can now be seen across the spectrum of computer …

StreamMR: an optimized MapReduce framework for AMD GPUs

M Elteir, H Lin, W Feng… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org
MapReduce is a programming model from Google that facilitates parallel processing on a
cluster of thousands of commodity computers. The success of MapReduce in cluster …

Optimized implementation of OpenCL kernels on FPGAs

K Shata, MK Elteir, AA El-Zoghabi - Journal of Systems Architecture, 2019 - Elsevier
Abstract Recently Field-Programmable Gate Array (FPGA) vendors, such as Altera and
Xilinx released an Open Computing Language Software Development Kit (OpenCL SDK) …

A parallel numerical acoustic simulation on a GPU using an edge-based smoothed finite element method

X Cao, Y Cai, X Cui - Advances in Engineering Software, 2020 - Elsevier
In this paper, a parallel computing scheme for performing implicit finite element calculations
on acoustic problems running on a graphics processing unit (GPU) is proposed. This …

GPUs as an opportunity for offloading garbage collection

M Maas, P Reames, J Morlan, K Asanović… - ACM SIGPLAN …, 2012 - dl.acm.org
GPUs have become part of most commodity systems. Nonetheless, they are often
underutilized when not executing graphics-intensive or special-purpose numerical …

Specializing coherence, consistency, and push/pull for gpu graph analytics

G Salvador, WH Darvin, M Huzaifa… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org
This work explores the interaction of three communication-centric design dimensions for
graph workloads on emerging integrated CPU-GPU systems: update propagation with and …

Synchronization trade-offs in gpu implementations of graph algorithms

R Kaleem, A Venkat, S Pai, M Hall… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Although there is an extensive literature on GPU implementations of graph algorithms, we
do not yet have a clear understanding of how implementation choices impact performance …

Improving the Scalability of GPU Synchronization Primitives

P Dalmia, R Mahapatra, J Intan… - … on Parallel and …, 2022 - ieeexplore.ieee.org
General-purpose GPU applications increasingly use synchronization to enforce ordering
between many threads accessing shared data. Accordingly, recently there has been a push …