Sparse gpu kernels for deep learning

T Gale, M Zaharia, C Young… - … Conference for High …, 2020 - ieeexplore.ieee.org
Scientific workloads have traditionally exploited high levels of sparsity to accelerate
computation and reduce memory requirements. While deep neural networks can be made …

Megablocks: Efficient sparse training with mixture-of-experts

T Gale, D Narayanan, C Young… - … of Machine Learning …, 2023 - proceedings.mlsys.org
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …

The tensor algebra compiler

F Kjolstad, S Kamil, S Chou, D Lugato… - Proceedings of the …, 2017 - dl.acm.org
Tensor algebra is a powerful tool with applications in machine learning, data analytics,
engineering and the physical sciences. Tensors are often sparse and compound operations …

ThunderSVM: A fast SVM library on GPUs and CPUs

Z Wen, J Shi, Q Li, B He, J Chen - Journal of Machine Learning Research, 2018 - jmlr.org
Support Vector Machines (SVMs) are classic supervised learning models for classification,
regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that …

CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication

W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …

Faster cnns with direct sparse convolutions and guided pruning

J Park, S Li, W Wen, PTP Tang, H Li, Y Chen… - arXiv preprint arXiv …, 2016 - arxiv.org
Phenomenally successful in practical inference problems, convolutional neural networks
(CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The …

Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus

M Zhu, T Zhang, Z Gu, Y Xie - Proceedings of the 52nd Annual IEEE …, 2019 - dl.acm.org
Deep neural networks have become the compelling solution for the applications such as
image classification, object detection, speech recognition, and machine translation …

The Combinatorial BLAS: Design, implementation, and applications

A Buluç, JR Gilbert - The International Journal of High …, 2011 - journals.sagepub.com
This paper presents a scalable high-performance software library to be used for graph
analysis and data mining. Large combinatorial graphs appear in many applications of high …

A recursive algebraic coloring technique for hardware-efficient symmetric sparse matrix-vector multiplication

C Alappat, A Basermann, AR Bishop… - ACM Transactions on …, 2020 - dl.acm.org
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building
block for many numerical linear algebra kernel operations or graph traversal applications …

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H Jin, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …