Dynamic load balancing using work-stealing

TO Odemuyiwa, JS Emer, JD Owens - arXiv preprint arXiv:2404.11591, 2024 - arxiv.org

In this work, we propose a unified abstraction for graph algorithms: the Extended General
Einsums language, or EDGE. The EDGE language expresses graph algorithms in the …

被引用次数：3 相关文章所有 2 个版本

[PDF] uiuc.edu

Lazy release consistency for GPUs

J Alsop, MS Orr, BM Beckmann… - 2016 49th Annual IEEE …, 2016 - ieeexplore.ieee.org

The heterogeneous-race-free (HRF) memory model has been embraced by the
Heterogeneous System Architecture (HSA) Foundation and OpenCL TM because it clearly …

被引用次数：60 相关文章所有 8 个版本

[PDF] github.io

A case for work-stealing on FPGAs with OpenCL atomics

N Ramanathan, J Wickerson, F Winterstein… - Proceedings of the …, 2016 - dl.acm.org

We provide a case study of work-stealing, a popular method for run-time load balancing, on
FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize work …

被引用次数：44 相关文章所有 9 个版本

[PDF] wisc.edu

Synchronization using remote-scope promotion

MS Orr, S Che, A Yilmazer, BM Beckmann… - ACM SIGARCH …, 2015 - dl.acm.org

Heterogeneous system architecture (HSA) and OpenCL define scoped synchronization to
facilitate low overhead communication across a subset of threads. Scoped synchronization …

被引用次数：49 相关文章所有 7 个版本

[PDF] escholarship.org

A GPU task-parallel model with dependency resolution

S Tzeng, B Lloyd, JD Owens - IEEE Computer, 2012 - escholarship.org

We present a task-parallel programming model for the GPU. Our task model is robust
enough to handle irregular workloads that contain dependencies. We present two …

被引用次数：46 相关文章所有 10 个版本

[PDF] kent.ac.uk

Remote-scope promotion: clarified, rectified, and verified

J Wickerson, M Batty, BM Beckmann… - Proceedings of the 2015 …, 2015 - dl.acm.org

Modern accelerator programming frameworks, such as OpenCL, organise threads into work-
groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD …

被引用次数：33 相关文章所有 10 个版本

[PDF] researchgate.net

HQL: A scalable synchronization mechanism for GPUs

A Yilmazer, D Kaeli - 2013 IEEE 27th International Symposium …, 2013 - ieeexplore.ieee.org

Modern GPUs rely on atomic operations to perform global communication. These atomic
operations can be used to construct finer-grained locks to provide support for mutual …

被引用次数：32 相关文章所有 6 个版本

Scalable collision detection using p-partition fronts on many-core processors

X Zhang, YJ Kim - IEEE Transactions on Visualization and …, 2013 - ieeexplore.ieee.org

We present a new parallel algorithm for collision detection using many-core computing
platforms of CPUs or GPUs. Based on the notion of a p-partition front, our algorithm is able to …

被引用次数：30 相关文章所有 8 个版本

[PDF] academia.edu

Lock‐Free Concurrent Data Structures

D Cederman, A Gidenstam, P Ha… - … multi‐core and …, 2017 - Wiley Online Library

Concurrent data structures are the data sharing side of parallel programming. An
implementation of a data structure is called lock‐free, if it allows multiple processes/threads …

被引用次数：30 相关文章所有 9 个版本

SCALE: A hybrid MPI and multithreading based work stealing approach for massive contingency analysis in power systems

SK Khaitan, JD McCalley - Electric Power Systems Research, 2014 - Elsevier

In this paper, we present SCALE, a hybrid message passing interface (MPI) and
multithreading based work Stealing approach for massive Contingency AnaLysis in powEr …

被引用次数：17 相关文章所有 5 个版本

高级搜索

QQ 群