Structure-driven optimizations for amorphous data-parallel programs

K Pingali, D Nguyen, M Kulkarni, M Burtscher… - Proceedings of the …, 2011 - dl.acm.org

For more than thirty years, the parallel programming community has used the dependence
graph as the main abstraction for reasoning about and exploiting parallelism in" regular" …

被引用次数：537 相关文章所有 20 个版本

[PDF] researchgate.net

A survey on thread-level speculation techniques

A Estebanez, DR Llanos… - ACM Computing Surveys …, 2016 - dl.acm.org

Thread-Level Speculation (TLS) is a promising technique that allows the parallel execution
of sequential code without relying on a prior, compile-time-dependence analysis. In this …

被引用次数：44 相关文章所有 4 个版本

[PDF] researchgate.net

Morph algorithms on GPUs

R Nasre, M Burtscher, K Pingali - Proceedings of the 18th ACM SIGPLAN …, 2013 - dl.acm.org

There is growing interest in using GPUs to accelerate graph algorithms such as breadth-first
search, computing page-ranks, and finding shortest paths. However, these algorithms do not …

被引用次数：135 相关文章所有 10 个版本

[PDF] academia.edu

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

MA Hassaan, M Burtscher, K Pingali - Acm Sigplan Notices, 2011 - dl.acm.org

Outside of computational science, most problems are formulated in terms of irregular data
structures such as graphs, trees and sets. Unfortunately, we understand relatively little about …

被引用次数：120 相关文章所有 13 个版本

[PDF] psu.edu

A fast GPU algorithm for graph connectivity

J Soman, K Kishore… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

Graphics processing units provide a large computational power at a very low price which
position them as an ubiquitous accelerator. General purpose programming on the graphics …

被引用次数：112 相关文章所有 12 个版本

[PDF] iitm.ac.in

Parallel inclusion-based points-to analysis

M Méndez-Lojo, A Mathew, K Pingali - Proceedings of the ACM …, 2010 - dl.acm.org

Inclusion-based points-to analysis provides a good trade-off between precision of results
and speed of analysis, and it has been incorporated into several production compilers …

被引用次数：89 相关文章所有 14 个版本

[PDF] psu.edu

SIMD parallelization of applications that traverse irregular data structures

B Ren, G Agrawal, JR Larus… - Proceedings of the …, 2013 - ieeexplore.ieee.org

Fine-grained data parallelism is increasingly common in mainstream processors in the form
of longer vectors and on-chip GPUs. This paper develops support for exploiting such data …

被引用次数：78 相关文章所有 7 个版本

[PDF] psu.edu

Accelerating irregular algorithms on gpgpus using fine-grain hardware worklists

JY Kim, C Batten - 2014 47th Annual IEEE/ACM International …, 2014 - ieeexplore.ieee.org

Although GPGPUs are traditionally used to accelerate workloads with regular control and
memory-access structure, recent work has shown that GPGPUs can also achieve significant …

被引用次数：58 相关文章所有 9 个版本

Parallel FPGA routing: Survey and challenges

M Stojilović - 2017 27th International Conference on Field …, 2017 - ieeexplore.ieee.org

As transistor scaling is slowing down [1], other opportunities for ensuring continuous
performance increase have to be explored. Field programmable gate arrays (FPGAs) are in …

被引用次数：35 相关文章所有 2 个版本

[PDF] archive.org

Parallel FPGA routing based on the operator formulation

YOM Moctar, P Brisk - Proceedings of the 51st Annual Design …, 2014 - dl.acm.org

We have implemented an FPGA routing algorithm on a shared memory multi-processor
using the Galois API, which offers speculative parallelism in software. The router is a parallel …

被引用次数：47 相关文章所有 5 个版本

高级搜索

QQ 群