Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

Training transformers with 4-bit integers

H Xi, C Li, J Chen, J Zhu - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural
network training. However, existing 4-bit training methods require custom numerical formats …

Rsc: accelerate graph neural networks training via randomized sparse computations

Z Liu, C Shengyuan, K Zhou, D Zha… - International …, 2023 - proceedings.mlr.press
Training graph neural networks (GNNs) is extremely time consuming because sparse graph-
based operations are hard to be accelerated by community hardware. Prior art successfully …

Winner-take-all column row sampling for memory efficient adaptation of language model

Z Liu, G Wang, SH Zhong, Z Xu, D Zha… - Advances in …, 2024 - proceedings.neurips.cc
As the model size grows rapidly, fine-tuning the large pre-trained language model has
become increasingly difficult due to its extensive memory usage. Previous works usually …

L2ight: Enabling on-chip learning for optical neural networks via efficient in-situ subspace optimization

J Gu, H Zhu, C Feng, Z Jiang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that
could represent a paradigm shift in efficient AI with its CMOS-compatibility, flexibility, ultra …

Dictionary-enabled efficient training of ConvNets for image classification

U Haider, M Hanif, A Rashid, SF Hussain - Image and Vision Computing, 2023 - Elsevier
Convolutional networks (ConvNets) are computationally expensive but well known for their
performance on image data. One way to reduce their complexity is to explore inherited data …

Randomized automatic differentiation

D Oktay, N McGreivy, J Aduol, A Beatson… - arXiv preprint arXiv …, 2020 - arxiv.org
The successes of deep learning, variational inference, and many other fields have been
aided by specialized implementations of reverse-mode automatic differentiation (AD) to …

Adaptive deep reuse: Accelerating CNN training on the fly

L Ning, H Guan, X Shen - 2019 IEEE 35th International …, 2019 - ieeexplore.ieee.org
This work proposes adaptive deep reuse, a method for accelerating CNN training by
identifying and avoiding the unnecessary computations contained in each specific training …