Dual-side sparse tensor core

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org

In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

被引用次数：63 相关文章所有 8 个版本

[PDF] acm.org

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li… - Proceedings of the 56th …, 2023 - dl.acm.org

Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

被引用次数：53 相关文章所有 10 个版本

[PDF] arxiv.org

Squant: On-the-fly data-free quantization via diagonal hessian approximation

C Guo, Y Qiu, J Leng, X Gao, C Zhang, Y Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Quantization of deep neural networks (DNN) has been proven effective for compressing and
accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the …

被引用次数：66 相关文章所有 7 个版本

[PDF] ssslab.cn

TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

Y Niu, Z Lu, H Ji, S Song, Z Jin, W Liu - Proceedings of the 27th ACM …, 2022 - dl.acm.org

Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …

被引用次数：44 相关文章所有 4 个版本

An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations

S Kang, G Park, S Kim, S Kim, D Han… - IEEE Journal on …, 2021 - ieeexplore.ieee.org

This paper presents a detailed overview of sparsity exploitation in deep neural network
(DNN) accelerators. Despite the algorithmic advancements which drove DNNs to become …

被引用次数：17 相关文章所有 4 个版本

[PDF] acm.org

Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity

YN Wu, PA Tsai, S Muralidharan, A Parashar… - Proceedings of the 56th …, 2023 - dl.acm.org

Due to complex interactions among various deep neural network (DNN) optimization
techniques, modern DNNs can have weights and activations that are dense or sparse with …

被引用次数：18 相关文章所有 9 个版本

[PDF] arxiv.org

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

被引用次数：35 相关文章所有 10 个版本

[PDF] arxiv.org

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling

Z Liu, J Leng, Z Zhang, Q Chen, C Li… - Proceedings of the 27th …, 2022 - dl.acm.org

Deep learning (DL) models have achieved great success in many application domains. As
such, many industrial companies such as Google and Facebook have acknowledged the …

被引用次数：40 相关文章所有 7 个版本

高级搜索

QQ 群

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

Sparseloop: An analytical approach to sparse tensor accelerator modeling

Squant: On-the-fly data-free quantization via diagonal hessian approximation

TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations

Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling

引用