Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplic...

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

被引用次数：80 相关文章所有 8 个版本

[PDF] arxiv.org

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

被引用次数：49 相关文章所有 10 个版本

[PDF] arxiv.org

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling

Z Liu, J Leng, Z Zhang, Q Chen, C Li… - Proceedings of the 27th …, 2022 - dl.acm.org

Deep learning (DL) models have achieved great success in many application domains. As
such, many industrial companies such as Google and Facebook have acknowledged the …

被引用次数：43 相关文章所有 7 个版本

[PDF] acm.org

Crescent: taming memory irregularities for accelerating deep point cloud analytics

Y Feng, G Hammonds, Y Gan, Y Zhu - Proceedings of the 49th Annual …, 2022 - dl.acm.org

3D perception in point clouds is transforming the perception ability of future intelligent
machines. Point cloud algorithms, however, are plagued by irregular memory accesses …

被引用次数：40 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

S Barrachina, MF Dolz, P San Juan… - Journal of Parallel and …, 2022 - Elsevier

Abstract Convolutional Neural Networks (CNNs) play a crucial role in many image
recognition and classification tasks, recommender systems, brain-computer interfaces, etc …

被引用次数：18 相关文章所有 4 个版本

[PDF] acm.org Full View

An accelerator for sparse convolutional neural networks leveraging systolic general matrix-matrix multiplication

M Soltaniyeh, RP Martin, S Nagarakatte - ACM Transactions on …, 2022 - dl.acm.org

This article proposes a novel hardware accelerator for the inference task with sparse
convolutional neural networks (CNNs) by building a hardware unit to perform Image to …

被引用次数：22 相关文章所有 3 个版本

[PDF] upc.edu

Mix-gemm: An efficient hw-sw architecture for mixed-precision quantized deep neural networks inference on edge devices

E Reggiani, A Pappalardo, M Doblas… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Deep Neural Network (DNN) inference based on quantized narrow-precision integer data
represents a promising research direction toward efficient deep learning computations on …

被引用次数：17 相关文章所有 4 个版本

[PDF] neurips.cc

Accelerating sparse convolution with column vector-wise sparsity

Y Tan, K Han, K Zhao, X Yu, Z Du… - Advances in …, 2022 - proceedings.neurips.cc

Weight sparsity is a promising approach to reducing the model size and computation cost of
convolutional neural networks (CNNs). Nevertheless, non-zero weights often distribute …

被引用次数：10 相关文章所有 3 个版本

[PDF] ejournal.org.cn

一体化信号处理与先进处理架构展望

梁兴东，李焱磊，刘云龙，郭宇豪，解玉凤，徐兴元… - 信号 …, 2022 - signal.ejournal.org.cn

多功能一体化系统是电子信息技术领域的重要发展方向之一, 一体化信号处理是其中的关键技术
, 对实现各种功能之间资源共享与高效协同具有重大意义, 同时也从算法到处理架构提出了新的 …

被引用次数：3 相关文章所有 3 个版本

[PDF] github.io

mnpusim: Evaluating the effect of sharing resources in multi-core npus

S Hwang, S Lee, J Kim, H Kim… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Multi-core neural processing units (NPUs) have emerged to scale the computation capability
of NPUs to efficiently support diverse machine learning tasks. In such multi-core NPUs …

被引用次数：7 相关文章所有 5 个版本

高级搜索

QQ 群