Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling

Z Liu, J Leng, Z Zhang, Q Chen, C Li… - Proceedings of the 27th …, 2022 - dl.acm.org
Deep learning (DL) models have achieved great success in many application domains. As
such, many industrial companies such as Google and Facebook have acknowledged the …

Crescent: taming memory irregularities for accelerating deep point cloud analytics

Y Feng, G Hammonds, Y Gan, Y Zhu - Proceedings of the 49th Annual …, 2022 - dl.acm.org
3D perception in point clouds is transforming the perception ability of future intelligent
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …

[HTML][HTML] Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

S Barrachina, MF Dolz, P San Juan… - Journal of Parallel and …, 2022 - Elsevier
Abstract Convolutional Neural Networks (CNNs) play a crucial role in many image
recognition and classification tasks, recommender systems, brain-computer interfaces, etc …

An accelerator for sparse convolutional neural networks leveraging systolic general matrix-matrix multiplication

M Soltaniyeh, RP Martin, S Nagarakatte - ACM Transactions on …, 2022 - dl.acm.org
This article proposes a novel hardware accelerator for the inference task with sparse
convolutional neural networks (CNNs) by building a hardware unit to perform Image to …

Mix-gemm: An efficient hw-sw architecture for mixed-precision quantized deep neural networks inference on edge devices

E Reggiani, A Pappalardo, M Doblas… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data
represents a promising research direction toward efficient deep learning computations on …

Accelerating sparse convolution with column vector-wise sparsity

Y Tan, K Han, K Zhao, X Yu, Z Du… - Advances in …, 2022 - proceedings.neurips.cc
Weight sparsity is a promising approach to reducing the model size and computation cost of
convolutional neural networks (CNNs). Nevertheless, non-zero weights often distribute …

一体化信号处理与先进处理架构展望

梁兴东, 李焱磊, 刘云龙, 郭宇豪, 解玉凤, 徐兴元… - 信号 …, 2022 - signal.ejournal.org.cn
多功能一体化系统是电子信息技术领域的重要发展方向之一, 一体化信号处理是其中的关键技术
, 对实现各种功能之间资源共享与高效协同具有重大意义, 同时也从算法到处理架构提出了新的 …

mnpusim: Evaluating the effect of sharing resources in multi-core npus

S Hwang, S Lee, J Kim, H Kim… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Multi-core neural processing units (NPUs) have emerged to scale the computation capability
of NPUs to efficiently support diverse machine learning tasks. In such multi-core NPUs …