Argobots: A lightweight low-level threading and tasking framework

S Seo, A Amer, P Balaji, C Bordage… - … on Parallel and …, 2017 - ieeexplore.ieee.org
In the past few decades, a number of user-level threading and tasking models have been
proposed in the literature to address the shortcomings of OS-level threads, primarily with …

Slaw: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Y Guo, J Zhao, V Cave, V Sarkar - … of the 15th ACM SIGPLAN Symposium …, 2010 - dl.acm.org
This poster introduces SLAW, a Scalable Locality-aware Adaptive Work-stealing scheduler.
The SLAW features an adaptive task scheduling algorithm combined with a locality-aware …

Optimizing load balancing and data-locality with data-aware scheduling

K Wang, X Zhou, T Li, D Zhao, M Lang… - … Conference on Big …, 2014 - ieeexplore.ieee.org
Load balancing techniques (eg work stealing) are important to obtain the best performance
for distributed task scheduling systems that have multiple schedulers making scheduling …

Software challenges in extreme scale systems

V Sarkar, W Harrod, AE Snavely - Journal of Physics: Conference …, 2009 - iopscience.iop.org
Computer systems anticipated in the 2015–2020 timeframe are referred to as Extreme Scale
because they will be built using massive multi-core processors with 100's of cores per chip …

[PDF][PDF] Hierarchical work stealing on manycore clusters

SJ Min, C Iancu, K Yelick - Fifth Conference on Partitioned Global Address …, 2011 - Citeseer
Abstract Partitioned Global Address Space languages like UPC offer a convenient way of
expressing large shared data structures, especially for irregular structures that require …

Lifeline-based global load balancing

VA Saraswat, P Kambadur, S Kodali, D Grove… - ACM SIGPLAN …, 2011 - dl.acm.org
On shared-memory systems, Cilk-style work-stealing has been used to effectively parallelize
irregular task-graph based applications such as Unbalanced Tree Search (UTS). There are …

Scalable and precise dynamic datarace detection for structured parallelism

R Raman, J Zhao, V Sarkar, M Vechev, E Yahav - Acm Sigplan Notices, 2012 - dl.acm.org
Existing dynamic race detectors suffer from at least one of the following three limitations:(i)
space overhead per memory location grows linearly with the number of parallel threads [13] …

[PDF][PDF] Exascale software study: Software challenges in extreme scale systems

S Amarasinghe, D Campbell, W Carlson… - DARPA IPTO, Air Force …, 2009 - Citeseer
Extreme Scale processors containing hundreds or even thousands of cores will challenge
current operating system (OS) practices. Many of the fundamental assumptions that underlie …

Integrating asynchronous task parallelism with MPI

S Chatterjee, S Tasırlar, Z Budimlic… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Effective combination of inter-node and intra-node parallelism is recognized to be a major
challenge for future extreme-scale systems. Many researchers have demonstrated the …

Efficient data race detection for async-finish parallelism

R Raman, J Zhao, V Sarkar, M Vechev… - Formal Methods in System …, 2012 - Springer
A major productivity hurdle for parallel programming is the presence of data races. Data
races can lead to all kinds of harmful program behaviors, including determinism violations …