Habanero-Java library: a Java 8 framework for multicore programming

S Imam, V Sarkar - Proceedings of the 2014 International Conference on …, 2014 - dl.acm.org
With the advent of the multicore era, it is clear that future growth in application performance
will primarily come from increased parallelism. We believe parallelism should be introduced …

Cooperative scheduling of parallel tasks with general synchronization patterns

S Imam, V Sarkar - ECOOP 2014–Object-Oriented Programming: 28th …, 2014 - Springer
In this paper, we address the problem of scheduling parallel tasks with general
synchronization patterns using a cooperative runtime. Current implementations for task …

Benchmarking heterogeneous hpc systems including reconfigurable fabrics: Community aspirations for ideal comparisons

P Jamieson, A Sanaullah… - 2018 IEEE High …, 2018 - ieeexplore.ieee.org
We describe a progressive philosophy to help benchmark systems and designs in the High
Performance Computing (HPC) domain. These systems now include heterogeneous multi …

NEMESYS: near-memory graph copy enhanced system-software

S Rheindt, A Fried, O Lenke, L Nolte, T Wild… - Proceedings of the …, 2019 - dl.acm.org
Despite tackling the memory and power walls over the last decades, new challenges for
manycore architectures arose due to the emergence of ever increasing memory …

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …

Pegasus: efficient data transfers for PGAS languages on non-cache-coherent many-cores

M Mohr, C Tradowsky - Design, Automation & Test in Europe …, 2017 - ieeexplore.ieee.org
To improve scalability, some many-core architectures abandon global cache coherence, but
still provide a shared address space. Partitioning the shared memory and communicating …

Improved mhp analysis

A Sankar, S Chakraborty, VK Nandivada - Proceedings of the 25th …, 2016 - dl.acm.org
May-Happen-in-Parallel (MHP) analysis is becoming the backbone of many of the parallel
analyses and optimizations. In this paper, we present new approaches to do MHP analysis …

UWOmppro: UWOmp++ with Point-to-Point Synchronization, Reduction and Schedules

A Agrawal, VK Nandivada - 2023 32nd International …, 2023 - ieeexplore.ieee.org
OpenMP is one of the most popular APIs widely used to realize parallelism in C/C++ and
FORTRAN programs. For efficient execution, an OpenMP program internally creates a team …

Optimizing recursive task parallel programs

S Gupta, R Shrivastava, VK Nandivada - Proceedings of the International …, 2017 - dl.acm.org
We present a new optimization DECAF that optimizes recursive task parallel (RTP)
programs by reducing the task creation and termination overheads. DECAF reduces the task …

Chunking loops with non-uniform workloads

IK Prabhu, VK Nandivada - Proceedings of the 34th ACM International …, 2020 - dl.acm.org
Task-parallel languages such as X10 implement dynamic lightweight task-parallel execution
model, where programmers are encouraged to express the ideal parallelism in the program …