IMSuite: A benchmark suite for simulating distributed algorithms

S Imam, V Sarkar - Proceedings of the 2014 International Conference on …, 2014 - dl.acm.org

With the advent of the multicore era, it is clear that future growth in application performance
will primarily come from increased parallelism. We believe parallelism should be introduced …

被引用次数：97 相关文章所有 2 个版本

[PDF] semanticscholar.org

Cooperative scheduling of parallel tasks with general synchronization patterns

S Imam, V Sarkar - ECOOP 2014–Object-Oriented Programming: 28th …, 2014 - Springer

In this paper, we address the problem of scheduling parallel tasks with general
synchronization patterns using a cooperative runtime. Current implementations for task …

被引用次数：31 相关文章所有 4 个版本

[PDF] bu.edu

Benchmarking heterogeneous hpc systems including reconfigurable fabrics: Community aspirations for ideal comparisons

P Jamieson, A Sanaullah… - 2018 IEEE High …, 2018 - ieeexplore.ieee.org

We describe a progressive philosophy to help benchmark systems and designs in the High
Performance Computing (HPC) domain. These systems now include heterogeneous multi …

被引用次数：13 相关文章所有 4 个版本

NEMESYS: near-memory graph copy enhanced system-software

S Rheindt, A Fried, O Lenke, L Nolte, T Wild… - Proceedings of the …, 2019 - dl.acm.org

Despite tackling the memory and power walls over the last decades, new challenges for
manycore architectures arose due to the emergence of ever increasing memory …

被引用次数：13 相关文章所有 2 个版本

[PDF] acm.org Full View

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org

Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …

被引用次数：1 相关文章

[PDF] date-conference.com

Pegasus: efficient data transfers for PGAS languages on non-cache-coherent many-cores

M Mohr, C Tradowsky - Design, Automation & Test in Europe …, 2017 - ieeexplore.ieee.org

To improve scalability, some many-core architectures abandon global cache coherence, but
still provide a shared address space. Partitioning the shared memory and communicating …

被引用次数：12 相关文章所有 5 个版本

[PDF] iitm.ac.in

Improved mhp analysis

A Sankar, S Chakraborty, VK Nandivada - Proceedings of the 25th …, 2016 - dl.acm.org

May-Happen-in-Parallel (MHP) analysis is becoming the backbone of many of the parallel
analyses and optimizations. In this paper, we present new approaches to do MHP analysis …

被引用次数：13 相关文章所有 5 个版本

UWOmp_pro: UWOmp++ with Point-to-Point Synchronization, Reduction and Schedules

A Agrawal, VK Nandivada - 2023 32nd International …, 2023 - ieeexplore.ieee.org

OpenMP is one of the most popular APIs widely used to realize parallelism in C/C++ and
FORTRAN programs. For efficient execution, an OpenMP program internally creates a team …

Optimizing recursive task parallel programs

S Gupta, R Shrivastava, VK Nandivada - Proceedings of the International …, 2017 - dl.acm.org

We present a new optimization DECAF that optimizes recursive task parallel (RTP)
programs by reducing the task creation and termination overheads. DECAF reduces the task …

被引用次数：9 相关文章所有 3 个版本

[PDF] iitm.ac.in

Chunking loops with non-uniform workloads

IK Prabhu, VK Nandivada - Proceedings of the 34th ACM International …, 2020 - dl.acm.org

Task-parallel languages such as X10 implement dynamic lightweight task-parallel execution
model, where programmers are encouraged to express the ideal parallelism in the program …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群