Sparse gpu kernels for deep learning

T Gale, M Zaharia, C Young… - … Conference for High …, 2020 - ieeexplore.ieee.org
Scientific workloads have traditionally exploited high levels of sparsity to accelerate
computation and reduce memory requirements. While deep neural networks can be made …

A recursive algebraic coloring technique for hardware-efficient symmetric sparse matrix-vector multiplication

C Alappat, A Basermann, AR Bishop… - ACM Transactions on …, 2020 - dl.acm.org
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building
block for many numerical linear algebra kernel operations or graph traversal applications …

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H Jin, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …

Procrustes: a dataflow and accelerator for sparse deep neural network training

D Yang, A Ghasemazar, X Ren, M Golub… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The success of DNN pruning has led to the development of energy-efficient inference
accelerators that support pruned models with sparse weight and activation tensors. Because …

Traversing large graphs on GPUs with unified memory

P Gera, H Kim, P Sao, H Kim, D Bader - Proceedings of the VLDB …, 2020 - dl.acm.org
Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …

Pruning via iterative ranking of sensitivity statistics

S Verdenius, M Stol, P Forré - arXiv preprint arXiv:2006.00896, 2020 - arxiv.org
With the introduction of SNIP [arXiv: 1810.02340 v2], it has been demonstrated that modern
neural networks can effectively be pruned before training. Yet, its sensitivity criterion has …

Automatic generation of efficient sparse tensor format conversion routines

S Chou, F Kjolstad, S Amarasinghe - Proceedings of the 41st ACM …, 2020 - dl.acm.org
This paper shows how to generate code that efficiently converts sparse tensors between
disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We …

[PDF][PDF] A C++ GraphBLAS: specification, implementation, parallelisation, and evaluation

AN Yzelman, D Di Nardo, JM Nash, WJ Suijlen - Preprint, 2020 - albert-jan.yzelman.net
The GraphBLAS is a programming model that expresses graph algorithms in linear
algebraic terms. It takes an easy-to-use, data-centric view where algebraic operations …

Intersectx: an efficient accelerator for graph mining

G Rao, J Chen, J Yik, X Qian - arXiv preprint arXiv:2012.10848, 2020 - arxiv.org
Graph pattern mining applications try to find all embeddings that match specific patterns.
Compared to the traditional graph computation, graph mining applications are computation …

Field programmable gate arrays for enhancing the speed and energy efficiency of quantum dynamics simulations

JM Rodrı́guez-Borbón, A Kalantar… - Journal of chemical …, 2020 - ACS Publications
We present the first application of field programmable gate arrays (FPGAs) as new,
customizable hardware architectures for carrying out fast and energy-efficient quantum …