Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed...

T Gale, M Zaharia, C Young… - … Conference for High …, 2020 - ieeexplore.ieee.org

Scientific workloads have traditionally exploited high levels of sparsity to accelerate
computation and reduce memory requirements. While deep neural networks can be made …

被引用次数：218 相关文章所有 7 个版本

[PDF] acm.org

A recursive algebraic coloring technique for hardware-efficient symmetric sparse matrix-vector multiplication

C Alappat, A Basermann, AR Bishop… - ACM Transactions on …, 2020 - dl.acm.org

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building
block for many numerical linear algebra kernel operations or graph traversal applications …

被引用次数：128 相关文章所有 10 个版本

[PDF] nsf.gov

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H Jin, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …

被引用次数：114 相关文章所有 10 个版本

[PDF] arxiv.org

Procrustes: a dataflow and accelerator for sparse deep neural network training

D Yang, A Ghasemazar, X Ren, M Golub… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

The success of DNN pruning has led to the development of energy-efficient inference
accelerators that support pruned models with sparse weight and activation tensors. Because …

被引用次数：74 相关文章所有 4 个版本

[PDF] osti.gov

Traversing large graphs on GPUs with unified memory

P Gera, H Kim, P Sao, H Kim, D Bader - Proceedings of the VLDB …, 2020 - dl.acm.org

Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …

被引用次数：52 相关文章所有 11 个版本

[PDF] arxiv.org

Pruning via iterative ranking of sensitivity statistics

S Verdenius, M Stol, P Forré - arXiv preprint arXiv:2006.00896, 2020 - arxiv.org

With the introduction of SNIP [arXiv: 1810.02340 v2], it has been demonstrated that modern
neural networks can effectively be pruned before training. Yet, its sensitivity criterion has …

被引用次数：34 相关文章所有 5 个版本

[PDF] acm.org

Automatic generation of efficient sparse tensor format conversion routines

S Chou, F Kjolstad, S Amarasinghe - Proceedings of the 41st ACM …, 2020 - dl.acm.org

This paper shows how to generate code that efficiently converts sparse tensors between
disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We …

被引用次数：33 相关文章所有 12 个版本

[PDF] yzelman.net

[PDF][PDF] A C++ GraphBLAS: specification, implementation, parallelisation, and evaluation

AN Yzelman, D Di Nardo, JM Nash, WJ Suijlen - Preprint, 2020 - albert-jan.yzelman.net

The GraphBLAS is a programming model that expresses graph algorithms in linear
algebraic terms. It takes an easy-to-use, data-centric view where algebraic operations …

被引用次数：16 相关文章

[PDF] arxiv.org

Intersectx: an efficient accelerator for graph mining

G Rao, J Chen, J Yik, X Qian - arXiv preprint arXiv:2012.10848, 2020 - arxiv.org

Graph pattern mining applications try to find all embeddings that match specific patterns.
Compared to the traditional graph computation, graph mining applications are computation …

被引用次数：16 相关文章所有 2 个版本

[PDF] osti.gov

Field programmable gate arrays for enhancing the speed and energy efficiency of quantum dynamics simulations

JM Rodrı́guez-Borbón, A Kalantar… - Journal of chemical …, 2020 - ACS Publications

We present the first application of field programmable gate arrays (FPGAs) as new,
customizable hardware architectures for carrying out fast and energy-efficient quantum …

被引用次数：20 相关文章所有 7 个版本

高级搜索

QQ 群