Accelerating tensor multiplication by exploring hybrid product with hardware and software co-design

Z Zhang, Z Fan, W Li, Y Qiu, Z Wang, X Ye… - Journal of Systems …, 2025 - Elsevier
Tensor multiplication holds a pivotal position in numerous applications. The existing
accelerators predominantly rely on inner or outer products for their computational strategies …

面向结构矩阵的可扩展并行矩阵乘算法框架

李胜国, 廖霞, 于恒彪, 黄春, 姜浩, 逯喜燕… - 计算机工程与 …, 2024 - joces.nudt.edu.cn
摘要: 结构矩阵在科学计算和工程应用中具有重要作用, 例如Cauchy, Toeplitz, Vandermonde
和Hankel 矩阵等. 虽然这些矩阵都是稠密的, 但只需要O (n) 个参数(生成元) 就可以表示, 其中n …

[PDF][PDF] 基于犅犔犃犆犛的2 5 犇并行矩阵乘法

廖霞, 李胜国, 卢宇彤, 杨灿群 - 计算机学报, 2021 - cjc.ict.ac.cn
摘要并行矩阵乘法是线性代数中最重要的基本运算之一, 同时也是许多科学应用的基石.
随着高性能计算(HPC) 向E 级计算发展, 并行矩阵乘法的通信开销所占比重越来越大 …

CSPACER: A reduced API set runtime for the space consistency model

KZ Ibrahim - The International Conference on High Performance …, 2021 - dl.acm.org
We present our design and implementation of a runtime for the Space Consistency model.
The Space Consistency model is a generalized form of the full-empty bit synchronization for …

Performance evaluation and modelling of single-precision matrix multiplication on Cerebras CS-2

R Matsuzaki, D Mukunoki… - SC24-W: Workshops of …, 2024 - ieeexplore.ieee.org
Although recent supercomputers have been improving their computational performance,
achieving performance scaling with respect to the number of nodes is not easy due to long …

A scalable parallel structured matrix multiplication algorithm framework

S LI, X LIAO, H YU, C HUANG, H JIANG… - Computer …, 2024 - joces.nudt.edu.cn
Structured matrices play an important role in scientific computing and engineering
applications, such as Cauchy, Toeplitz, Vandermonde, and Hankel matrices. Although these …