Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be …
C Guo, R Zhang, J Xu, J Leng, Z Liu, Z Huang… - Proceedings of the 29th …, 2024 - dl.acm.org
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However …
Approximate nearest neighbor (ANN) search is a widely applied technique in modern intelligent applications, such as recommendation systems and vector databases. Therefore …
By quantizing weights with different precision for different parts of a network, mixed-precision quantization promises to reduce the hardware cost and improve the speed of deep neural …
J Zhou, J Wu, Y Gao, Y Ding, C Tao, B Li… - … on Computer-Aided …, 2023 - ieeexplore.ieee.org
To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into …
V Leon, MA Hanif, G Armeniakos, X Jiao… - arXiv preprint arXiv …, 2023 - arxiv.org
The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing …
On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to …
An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been …
Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy …