Adaptive sparse tiling for sparse matrix multiplication

C Hong, A Sukumaran-Rajam, I Nisa, K Singh… - Proceedings of the 24th …, 2019 - dl.acm.org
Tiling is a key technique for data locality optimization and is widely used in high-
performance implementations of dense matrix-matrix multiplication for multicore/manycore …

AIBench: an industry standard internet service AI benchmark suite

W Gao, F Tang, L Wang, J Zhan, C Lan, C Luo… - arXiv preprint arXiv …, 2019 - arxiv.org
Today's Internet Services are undergoing fundamental changes and shifting to an intelligent
computing era where AI is widely employed to augment services. In this context, many …

Optimizing sparse tensor times matrix on GPUs

Y Ma, J Li, X Wu, C Yan, J Sun, R Vuduc - Journal of Parallel and …, 2019 - Elsevier
This work optimizes tensor-times-dense matrix multiply (Ttm) for general sparse and semi-
sparse tensors on CPU and NVIDIA GPU platforms. Ttm is a computational kernel in tensor …

Acorns: A framework for accelerating deep neural networks with input sparsity

X Dong, L Liu, P Zhao, G Li, J Li… - 2019 28th …, 2019 - ieeexplore.ieee.org
Deep neural networks have been employed in a broad range of applications, including face
detection, natural language processing, and autonomous driving. Yet, the neural networks …

Scalability of hybrid spmv on intel xeon phi knights landing

BA Page, PM Kogge - 2019 International Conference on High …, 2019 - ieeexplore.ieee.org
SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of
applications that are memory bandwidth and communication, not flop, driven. Sparsity and …

Benchmarking SpMV methods on many-core platforms

B Xie, Z Jia, Y Bao - … International Symposium, Bench 2018, Seattle, WA …, 2019 - Springer
SpMV is an essential kernel existing in many HPC and data center applications. Meanwhile,
the emerging many-core hardware provides promising computational power, and is widely …

Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications

C Liu, H Yang, X Liu, Z Luan, D Qian - arXiv preprint arXiv:1910.13346, 2019 - arxiv.org
Modern optimizing compilers are able to exploit memory access or computation patterns to
generate vectorization codes. However, such patterns in irregular applications are unknown …

[图书][B] Code Optimization on GPUs

C Hong - 2019 - search.proquest.com
Abstract Graphic Processing Units (GPUs) have become popular in the last decade due to
their high memory bandwidth and powerful computing capacity. Nevertheless, achieving …

[引用][C] A unified framework for benchmarking sparse matrix-vector multiplication methods

E Sarili - Fen Bilimleri Enstitüsü