Performance characterization and optimization of atomic operations on amd gpus

P Wang, J Wang, C Li, J Wang, H Zhu… - ACM Transactions on …, 2021 - dl.acm.org

Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …

被引用次数：41 相关文章

[PDF] acm.org

Cairo: A compiler-assisted technique for enabling instruction-level offloading of processing-in-memory

R Hadidi, L Nai, H Kim, H Kim - ACM Transactions on Architecture and …, 2017 - dl.acm.org

Three-dimensional (3D)-stacking technology and the memory-wall problem have
popularized processing-in-memory (PIM) concepts again, which offers the benefits of …

被引用次数：71 相关文章所有 4 个版本

[PDF] pasalabs.org

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

KL Spafford, JS Meredith, S Lee, D Li, PC Roth… - Proceedings of the 9th …, 2012 - dl.acm.org

With the rise of general purpose computing on graphics processing units (GPGPU), the
influence from consumer markets can now be seen across the spectrum of computer …

被引用次数：87 相关文章所有 8 个版本

[PDF] vt.edu

StreamMR: an optimized MapReduce framework for AMD GPUs

M Elteir, H Lin, W Feng… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org

MapReduce is a programming model from Google that facilitates parallel processing on a
cluster of thousands of commodity computers. The success of MapReduce in cluster …

被引用次数：64 相关文章所有 16 个版本

[PDF] researchgate.net

Optimized implementation of OpenCL kernels on FPGAs

K Shata, MK Elteir, AA El-Zoghabi - Journal of Systems Architecture, 2019 - Elsevier

Abstract Recently Field-Programmable Gate Array (FPGA) vendors, such as Altera and
Xilinx released an Open Computing Language Software Development Kit (OpenCL SDK) …

被引用次数：25 相关文章所有 3 个版本

A parallel numerical acoustic simulation on a GPU using an edge-based smoothed finite element method

X Cao, Y Cai, X Cui - Advances in Engineering Software, 2020 - Elsevier

In this paper, a parallel computing scheme for performing implicit finite element calculations
on acoustic problems running on a graphics processing unit (GPU) is proposed. This …

被引用次数：17 相关文章

[PDF] berkeley.edu

GPUs as an opportunity for offloading garbage collection

M Maas, P Reames, J Morlan, K Asanović… - ACM SIGPLAN …, 2012 - dl.acm.org

GPUs have become part of most commodity systems. Nonetheless, they are often
underutilized when not executing graphics-intensive or special-purpose numerical …

被引用次数：31 相关文章所有 12 个版本

[PDF] arxiv.org

Specializing coherence, consistency, and push/pull for gpu graph analytics

G Salvador, WH Darvin, M Huzaifa… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org

This work explores the interaction of three communication-centric design dimensions for
graph workloads on emerging integrated CPU-GPU systems: update propagation with and …

被引用次数：13 相关文章所有 8 个版本

[PDF] rochester.edu

Synchronization trade-offs in gpu implementations of graph algorithms

R Kaleem, A Venkat, S Pai, M Hall… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Although there is an extensive literature on GPU implementations of graph algorithms, we
do not yet have a clear understanding of how implementation choices impact performance …

被引用次数：22 相关文章所有 8 个版本

Improving the Scalability of GPU Synchronization Primitives

P Dalmia, R Mahapatra, J Intan… - … on Parallel and …, 2022 - ieeexplore.ieee.org

General-purpose GPU applications increasingly use synchronization to enforce ordering
between many threads accessing shared data. Accordingly, recently there has been a push …

被引用次数：9 相关文章所有 2 个版本

高级搜索

QQ 群