Vectorization of multigrid codes using SIMD ISA extensions

NM Ho, WF Wong - 2017 IEEE High Performance Extreme …, 2017 - ieeexplore.ieee.org

With the growing importance of deep learning and energy-saving approximate computing,
half precision floating point arithmetic (FP16) is fast gaining popularity. Nvidia's recent …

被引用次数：96 相关文章所有 5 个版本网页快照

[PDF] sci-hub [PDF] psu.edu [ 下载加速 ]

SIMD parallelization of applications that traverse irregular data structures

B Ren, G Agrawal, JR Larus… - Proceedings of the …, 2013 - ieeexplore.ieee.org

Fine-grained data parallelism is increasingly common in mainstream processors in the form
of longer vectors and on-chip GPUs. This paper develops support for exploiting such data …

被引用次数：78 相关文章所有 7 个版本网页快照

[PDF] sci-hub [PDF] wm.edu [ 下载加速 ]

Microspec: Speculation-centric fine-grained parallelization for fsm computations

J Qiu, Z Zhao, B Ren - … of the 2016 International Conference on Parallel …, 2016 - dl.acm.org

Finite state machines (FSMs) are basic computation models that play essential roles in many
applications. Enabling efficient parallel FSM execution is critical to the performance of these …

被引用次数：33 相关文章所有 5 个版本网页快照

[PDF] sci-hub [PDF] acm.org [ 下载加速 ]

Combining SIMD and Many/Multi-core parallelism for finite state machines with enumerative speculation

P Jiang, G Agrawal - Proceedings of the 22nd ACM SIGPLAN …, 2017 - dl.acm.org

Finite State Machine (FSM) is the key kernel behind many popular applications, including
regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is …

被引用次数：29 相关文章所有 8 个版本网页快照

[PDF] sci-hub [PDF] researchgate.net [ 下载加速 ]

Optimizing and scaling HPCG on Tianhe-2: early experience

X Zhang, C Yang, F Liu, Y Liu, Y Lu - … 2014, Dalian, China, August 24-27 …, 2014 - Springer

In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's
largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU …

被引用次数：36 相关文章所有 4 个版本网页快照

[PDF] sci-hub [PDF] researchgate.net [ 下载加速 ]

Accelerating HPCG on Tianhe-2: a hybrid CPU-MIC algorithm

Y Liu, X Zhang, C Yang, F Liu… - 2014 20th IEEE …, 2014 - ieeexplore.ieee.org

In this paper, we propose a hybrid algorithm to enable and accelerate the High Performance
Conjugate Gradient (HPCG) benchmark on a heterogeneous node with an arbitrary number …

被引用次数：20 相关文章所有 4 个版本网页快照

[PDF] sci-hub [PDF] acm.org [ 下载加速 ]

A portable optimization engine for accelerating irregular data-traversal applications on SIMD architectures

B Ren, T Mytkowicz, G Agrawal - ACM Transactions on Architecture and …, 2014 - dl.acm.org

Fine-grained data parallelism is increasingly common in the form of longer vectors
integrated with mainstream processors (SSE, AVX) and various GPU architectures. This …

被引用次数：14 相关文章所有 6 个版本网页快照

[PDF] sci-hub

Efficient scheduling of recursive control flow on gpus

X Huo, S Krishnamoorthy, G Agrawal - Proceedings of the 27th …, 2013 - dl.acm.org

Graphics processing units (GPUs) have rapidly emerged as a very significant player in high
performance computing. Single instruction multiple thread (SIMT) pipelines are typically …

被引用次数：19 相关文章所有 4 个版本网页快照

[PDF] sci-hub

Planning and composition of Web services with dynamic constraints using situation calculus

K Nariai, I Paik, M Shinozawa - The Fifth International …, 2005 - ieeexplore.ieee.org

Web service composition enables the creation of new and more valuable services to
combine and link existing services. However, the treatment of user constraints (as user …

被引用次数：10 相关文章所有 3 个版本网页快照

[PDF] sci-hub

Combining simd and many/multi-core parallelism for finite-state machines with enumerative speculation

P Jiang, Y Xia, G Agrawal - ACM Transactions on Parallel Computing …, 2020 - dl.acm.org

Finite-state Machine (FSM) is the key kernel behind many popular applications, including
regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is …

被引用次数：3 相关文章所有 2 个版本网页快照

高级搜索

QQ 群