Accelerating irregular algorithms on gpgpus using fine-grain hardware worklists

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org

Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

被引用次数：192 相关文章所有 10 个版本

[PDF] bilkent.edu.tr

Energy efficient architecture for graph analytics accelerators

MM Ozdal, S Yesil, T Kim, A Ayupov, J Greth… - ACM SIGARCH …, 2016 - dl.acm.org

Specialized hardware accelerators can significantly improve the performance and power
efficiency of compute systems. In this paper, we focus on hardware accelerators for graph …

被引用次数：213 相关文章所有 8 个版本

[PDF] iczhiku.com

[图书][B] General-purpose graphics processor architectures

TM Aamodt, WWL Fung, TG Rogers, M Martonosi - 2018 - Springer

Originally developed to support video games, graphics processor units (GPUs) are now
increasingly used for general-purpose (non-graphics) applications ranging from machine …

被引用次数：103 相关文章所有 5 个版本

[PDF] illinois.edu

Efficient GPU synchronization without scopes: Saying no to complex consistency models

MD Sinclair, J Alsop, SV Adve - … of the 48th International Symposium on …, 2015 - dl.acm.org

As GPUs have become increasingly general purpose, applications with more general
sharing patterns and fine-grained synchronization have started to emerge. Unfortunately …

被引用次数：92 相关文章所有 12 个版本

[PDF] acm.org Full View

Parallel graph analytics

A Lenharth, D Nguyen, K Pingali - Communications of the ACM, 2016 - dl.acm.org

Parallel graph analytics Page 1 78 COMMUNICATIONS OF THE ACM | MAY 2016 | VOL. 59 |
NO. 5 contributed articles DOI:10.1145/2901919 Data-centric abstractions and execution …

被引用次数：76 相关文章

[PDF] johnalsop.net

Spandex: A flexible interface for efficient heterogeneous coherence

J Alsop, M Sinclair, S Adve - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org

Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …

被引用次数：62 相关文章所有 7 个版本

[PDF] acm.org

Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching

D Zhang, X Ma, M Thomson, D Chiou - ACM SIGPLAN Notices, 2018 - dl.acm.org

The importance of irregular applications such as graph analytics is rapidly growing with the
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …

被引用次数：62 相关文章所有 4 个版本

[PDF] github.io

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

X Ren, D Lustig, E Bolotin, A Jaleel… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …

被引用次数：41 相关文章所有 3 个版本

[PDF] semanticscholar.org

Laperm: Locality aware scheduler for dynamic parallelism on gpus

J Wang, N Rubin, A Sidelnik… - ACM SIGARCH Computer …, 2016 - dl.acm.org

Recent developments in GPU execution models and architectures have introduced dynamic
parallelism to facilitate the execution of irregular applications where control flow and …

被引用次数：69 相关文章所有 5 个版本

[PDF] acm.org

Free launch: optimizing GPU dynamic kernel launches through thread reuse

G Chen, X Shen - Proceedings of the 48th International Symposium on …, 2015 - dl.acm.org

Supporting dynamic parallelism is important for GPU to benefit a broad range of
applications. There are currently two fundamental ways for programs to exploit dynamic …

被引用次数：73 相关文章所有 6 个版本

高级搜索

QQ 群