Cvr: Efficient vectorization of spmv on x86 processors

C Hong, A Sukumaran-Rajam, I Nisa, K Singh… - Proceedings of the 24th …, 2019 - dl.acm.org

Tiling is a key technique for data locality optimization and is widely used in high-
performance implementations of dense matrix-matrix multiplication for multicore/manycore …

被引用次数：162 相关文章所有 4 个版本

[PDF] arxiv.org

AIBench: an industry standard internet service AI benchmark suite

W Gao, F Tang, L Wang, J Zhan, C Lan, C Luo… - arXiv preprint arXiv …, 2019 - arxiv.org

Today's Internet Services are undergoing fundamental changes and shifting to an intelligent
computing era where AI is widely employed to augment services. In this context, many …

被引用次数：40 相关文章所有 4 个版本

[PDF] sciencedirect.com

Optimizing sparse tensor times matrix on GPUs

Y Ma, J Li, X Wu, C Yan, J Sun, R Vuduc - Journal of Parallel and …, 2019 - Elsevier

This work optimizes tensor-times-dense matrix multiply (Ttm) for general sparse and semi-
sparse tensors on CPU and NVIDIA GPU platforms. Ttm is a computational kernel in tensor …

被引用次数：38 相关文章所有 5 个版本

Acorns: A framework for accelerating deep neural networks with input sparsity

X Dong, L Liu, P Zhao, G Li, J Li… - 2019 28th …, 2019 - ieeexplore.ieee.org

Deep neural networks have been employed in a broad range of applications, including face
detection, natural language processing, and autonomous driving. Yet, the neural networks …

被引用次数：16 相关文章所有 3 个版本

[PDF] nsf.gov

Scalability of hybrid spmv on intel xeon phi knights landing

BA Page, PM Kogge - 2019 International Conference on High …, 2019 - ieeexplore.ieee.org

SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of
applications that are memory bandwidth and communication, not flop, driven. Sparsity and …

被引用次数：8 相关文章所有 3 个版本

Benchmarking SpMV methods on many-core platforms

B Xie, Z Jia, Y Bao - … International Symposium, Bench 2018, Seattle, WA …, 2019 - Springer

SpMV is an essential kernel existing in many HPC and data center applications. Meanwhile,
the emerging many-core hardware provides promising computational power, and is widely …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications

C Liu, H Yang, X Liu, Z Luan, D Qian - arXiv preprint arXiv:1910.13346, 2019 - arxiv.org

Modern optimizing compilers are able to exploit memory access or computation patterns to
generate vectorization codes. However, such patterns in irregular applications are unknown …

[图书][B] Code Optimization on GPUs

C Hong - 2019 - search.proquest.com

Abstract Graphic Processing Units (GPUs) have become popular in the last decade due to
their high memory bandwidth and powerful computing capacity. Nevertheless, achieving …

[引用][C] A unified framework for benchmarking sparse matrix-vector multiplication methods

E Sarili - Fen Bilimleri Enstitüsü

高级搜索

QQ 群