Only train once: A one-shot neural network training and pruning framework

T Chen, B Ji, T Ding, B Fang, G Wang… - Advances in …, 2021 - proceedings.neurips.cc
Structured pruning is a commonly used technique in deploying deep neural networks
(DNNs) onto resource-constrained devices. However, the existing pruning methods are …

Otov2: Automatic, generic, user-friendly

T Chen, L Liang, T Ding, Z Zhu, I Zharkov - arXiv preprint arXiv …, 2023 - arxiv.org
The existing model compression methods via structured pruning typically require
complicated multi-stage procedures. Each individual stage necessitates numerous …

Sparsity in transformers: A systematic literature review

M Farina, U Ahmad, A Taha, H Younes, Y Mesbah… - Neurocomputing, 2024 - Elsevier
Transformers have become the state-of-the-art architectures for various tasks in Natural
Language Processing (NLP) and Computer Vision (CV); however, their space and …

Learning pruning-friendly networks via frank-wolfe: One-shot, any-sparsity, and no retraining

M Lu, X Luo, T Chen, W Chen, D Liu… - … Conference on Learning …, 2022 - openreview.net
We present a novel framework to train a large deep neural network (DNN) for only $\textit
{once} $, which can then be pruned to $\textit {any sparsity ratio} $ to preserve competitive …

The contextual lasso: Sparse linear models via deep neural networks

R Thompson, A Dezfouli… - Advances in Neural …, 2023 - proceedings.neurips.cc
Sparse linear models are one of several core tools for interpretable machine learning, a field
of emerging importance as predictive models permeate decision-making in many domains …

Less is More–Towards parsimonious multi-task models using structured sparsity

R Upadhyay, R Phlypo, R Saini… - … on Parsimony and …, 2024 - proceedings.mlr.press
Abstract Model sparsification in deep learning promotes simpler, more interpretable models
with fewer parameters. This not only reduces the model's memory footprint and …

Training structured neural networks through manifold identification and variance reduction

ZS Huang, C Lee - arXiv preprint arXiv:2112.02612, 2021 - arxiv.org
This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a
regularization term for promoting desired structures. RMDA does not incur computation …

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

V Hofmann, X Dong, JB Pierrehumbert… - arXiv preprint arXiv …, 2021 - arxiv.org
The increasing polarization of online political discourse calls for computational tools that
automatically detect and monitor ideological divides in social media. We introduce a …

Proximal methods for nonconvex composite optimization problems

T Lechner - 2022 - opus.bibliothek.uni-wuerzburg.de
Optimization problems with composite functions deal with the minimization of the sum of a
smooth function and a convex nonsmooth function. In this thesis several numerical methods …

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

T Chen, X Qu, D Aponte, C Banbury, J Ko… - arXiv preprint arXiv …, 2024 - arxiv.org
Structured pruning is one of the most popular approaches to effectively compress the heavy
deep neural networks (DNNs) into compact sub-networks while retaining performance. The …