Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Cvr: Efficient vectorization of spmv on x86 processors

B Xie, J Zhan, X Liu, W Gao, Z Jia, X He… - Proceedings of the 2018 …, 2018 - dl.acm.org
Sparse Matrix-vector Multiplication (SpMV) is an important computation kernel widely used
in HPC and data centers. The irregularity of SpMV is a well-known challenge that limits …

Optimizing sparse matrix-vector multiplication for large-scale data analytics

D Buono, F Petrini, F Checconi, X Liu, X Que… - Proceedings of the …, 2016 - dl.acm.org
Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel, used by a large class of
numerical algorithms. Emerging big-data and machine learning applications are propelling …

Adaptive multi-level blocking optimization for sparse matrix vector multiplication on GPU

Y Nagasaka, A Nukada, S Matsuoka - Procedia Computer Science, 2016 - Elsevier
Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations.
Many-core processors such as GPUs accelerate SpMV computations with high parallelism …

[PDF][PDF] Parallel and scalable sparse basic linear algebra subprograms

W Liu - 2015 - nbi.ku.dk
Sparse basic linear algebra subprograms (BLAS) are fundamental building blocks for
numerous scientific computations and graph applications. Compared with Dense BLAS …

Data analytics with nvlink: An spmv case study

D Buono, F Artico, F Checconi, JW Choi… - Proceedings of the …, 2017 - dl.acm.org
A recent advancement in the world of heterogeneous computing, the NVLink interconnect
enables high-speed communication between GPUs and CPUs and among GPUs. In this …

[PDF][PDF] AN EXTENSIVE SURVEY OF LITERATURE ON EFFICIENT FAULT-TOLERANCE DESIGN FOR INTEGER PARALLEL MATRIX–VECTOR MULTIPLICATIONS

K Chakradhari, B Gupta - ijarse.org
Matrix multiplication is widely used as core operation in various signals processing
application like software defined radio. The FFT processor is widely used in DSP and …

IMPROVE THE EXECUTION TIME BY USING GPU FOR COMPLEX APPLICATION WITH SIMD

R Tiwari, M Sharma, KK Mehta - Indian Journal of Scientific Research, 2018 - go.gale.com
Now a days sequential processing is not sufficient for a large data computation in the area of
computer science and technology. To solve the computation problem for large data, the …

[PDF][PDF] A Survey on Matrix Multiplication for GPU

R Tiwari - International Journal of Advance Research in …, 2016 - ijarest.org
Now a day sequential processing is certainly not sufficient for a large data computation in
the area of computer science and technology. The need for high-performance computation …

疎行列ベクトル積計算を対象としたGPU 向けメモリアクセス削減手法

長坂侑亮, 額田彰, 松岡聡 - 研究報告ハイパフォーマンス …, 2015 - ipsj.ixsq.nii.ac.jp
論文抄録 科学技術計算において巨大で疎な問題行列を持つ連立一次方程式を解く際,
疎行列ベクトル積計算が実行時間の大部分を占めている. 疎行列ベクトル積計算の GPU …