Coarsening the granularity: Towards structurally sparse lottery tickets

H Cheng, M Zhang, JQ Shi - arXiv preprint arXiv:2308.06767, 2023 - arxiv.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer

Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

被引用次数：129 相关文章所有 6 个版本

[PDF] neurips.cc

Advancing model pruning via bi-level optimization

Y Zhang, Y Yao, P Ram, P Zhao… - Advances in …, 2022 - proceedings.neurips.cc

The deployment constraints in practical applications necessitate the pruning of large-scale
deep learning models, ie, promoting their weight sparsity. As illustrated by the Lottery Ticket …

被引用次数：39 相关文章所有 8 个版本

[PDF] jmlr.org

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Trainability preserving neural pruning

H Wang, Y Fu - arXiv preprint arXiv:2207.12534, 2022 - arxiv.org

Many recent works have shown trainability plays a central role in neural network pruning--
unattended broken trainability can lead to severe under-performance and unintentionally …

被引用次数：29 相关文章所有 3 个版本

[PDF] neurips.cc

Exposing and exploiting fine-grained block structures for fast and accurate sparse training

P Jiang, L Hu, S Song - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Sparse training is a popular technique to reduce the overhead of training large models.
Although previous work has shown promising results for nonstructured sparse models, it is …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

Towards sparsification of graph neural networks

H Peng, D Gurevin, S Huang, T Geng… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org

As real-world graphs expand in size, larger GNN models with billions of parameters are
deployed. High parameter count in such models makes training and inference on graphs …

被引用次数：26 相关文章所有 5 个版本

[PDF] neurips.cc

Dynamic sparsity is channel-level sparsity learner

L Yin, G Li, M Fang, L Shen, T Huang… - Advances in …, 2024 - proceedings.neurips.cc

Sparse training has received an upsurging interest in machine learning due to its tantalizing
saving potential for both the entire training process as well as the inference. Dynamic sparse …

被引用次数：10 相关文章所有 5 个版本

[PDF] nsf.gov

[PDF][PDF] Audio lottery: Speech recognition made ultra-lightweight, noise-robust, and transferable

S Ding, T Chen, Z Wang - International Conference on Learning …, 2022 - par.nsf.gov

Lightweight speech recognition models have seen explosive demands owing to a growing
amount of speech-interactive features on mobile devices. Since designing such systems …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Sparsity winning twice: Better robust generalization from more efficient training

T Chen, Z Zhang, P Wang, S Balachandra… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent studies demonstrate that deep networks, even robustified by the state-of-the-art
adversarial training (AT), still suffer from large robust generalization gaps, in addition to the …

被引用次数：30 相关文章所有 4 个版本

高级搜索

QQ 群