Faster neural network training with approximate tensor operations

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

被引用次数：163 相关文章所有 6 个版本

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：48 相关文章所有 4 个版本

[PDF] ieee.org

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

被引用次数：101 相关文章所有 7 个版本

[PDF] neurips.cc

Training transformers with 4-bit integers

H Xi, C Li, J Chen, J Zhu - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural
network training. However, existing 4-bit training methods require custom numerical formats …

被引用次数：31 相关文章所有 6 个版本

[PDF] mlr.press

Rsc: accelerate graph neural networks training via randomized sparse computations

Z Liu, C Shengyuan, K Zhou, D Zha… - International …, 2023 - proceedings.mlr.press

Training graph neural networks (GNNs) is extremely time consuming because sparse graph-
based operations are hard to be accelerated by community hardware. Prior art successfully …

被引用次数：20 相关文章所有 7 个版本

[PDF] neurips.cc

Winner-take-all column row sampling for memory efficient adaptation of language model

Z Liu, G Wang, SH Zhong, Z Xu, D Zha… - Advances in …, 2024 - proceedings.neurips.cc

As the model size grows rapidly, fine-tuning the large pre-trained language model has
become increasingly difficult due to its extensive memory usage. Previous works usually …

被引用次数：16 相关文章所有 8 个版本

[PDF] neurips.cc

L2ight: Enabling on-chip learning for optical neural networks via efficient in-situ subspace optimization

J Gu, H Zhu, C Feng, Z Jiang… - Advances in Neural …, 2021 - proceedings.neurips.cc

Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that
could represent a paradigm shift in efficient AI with its CMOS-compatibility, flexibility, ultra …

被引用次数：25 相关文章所有 13 个版本

Dictionary-enabled efficient training of ConvNets for image classification

U Haider, M Hanif, A Rashid, SF Hussain - Image and Vision Computing, 2023 - Elsevier

Convolutional networks (ConvNets) are computationally expensive but well known for their
performance on image data. One way to reduce their complexity is to explore inherited data …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Randomized automatic differentiation

D Oktay, N McGreivy, J Aduol, A Beatson… - arXiv preprint arXiv …, 2020 - arxiv.org

The successes of deep learning, variational inference, and many other fields have been
aided by specialized implementations of reverse-mode automatic differentiation (AD) to …

被引用次数：31 相关文章所有 7 个版本

[PDF] nsf.gov

Adaptive deep reuse: Accelerating CNN training on the fly

L Ning, H Guan, X Shen - 2019 IEEE 35th International …, 2019 - ieeexplore.ieee.org

This work proposes adaptive deep reuse, a method for accelerating CNN training by
identifying and avoiding the unnecessary computations contained in each specific training …

被引用次数：34 相关文章所有 6 个版本

高级搜索

QQ 群