This chapter provides approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods …
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a …
Y He, L Liu, J Liu, W Wu, H Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference …
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …
Most existing neural network pruning methods hand-crafted their importance criteria and structures to prune. This constructs heavy and unintended dependencies on heuristics and …
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
N Zheng, B Lin, Q Zhang, L Ma, Y Yang… - … USENIX Symposium on …, 2022 - usenix.org
Sparsity is becoming arguably the most critical dimension to explore for efficiency and scalability, as deep learning models grow significantly larger and more complex. After all …
Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models. However, existing structured pruning methods often result in …
T Han, D Li, J Liu, L Tian… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract Model quantization is an important mechanism for energy-efficient deployment of deep neural networks on resource-constrained devices by reducing the bit precision of …