Go wide, then narrow: Efficient training of deep thin networks

M Chen, H Peng, J Fu, H Ling - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Recently, pure transformer-based models have shown great potentials for vision tasks such
as image classification and detection. However, the design of transformer networks is …

被引用次数：341 相关文章所有 5 个版本

[PDF] mit.edu

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu

Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

被引用次数：201 相关文章所有 14 个版本

[PDF] arxiv.org

Norm: Knowledge distillation via n-to-one representation matching

X Liu, L Li, C Li, A Yao - arXiv preprint arXiv:2305.13803, 2023 - arxiv.org

Existing feature distillation methods commonly adopt the One-to-one Representation
Matching between any pre-selected teacher-student layer pair. In this paper, we present N …

被引用次数：55 相关文章所有 3 个版本

[PDF] neurips.cc

Towards efficient post-training quantization of pre-trained language models

H Bai, L Hou, L Shang, X Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc

Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …

被引用次数：58 相关文章所有 5 个版本

[PDF] neurips.cc

Shiftaddnet: A hardware-inspired deep network

H You, X Chen, Y Zhang, C Li, S Li… - Advances in …, 2020 - proceedings.neurips.cc

Multiplication (eg, convolution) is arguably a cornerstone of modern deep neural networks
(DNNs). However, intensive multiplications cause expensive resource costs that challenge …

被引用次数：90 相关文章所有 6 个版本

[PDF] thecvf.com

Partially does it: Towards scene-level fg-sbir with partial input

PN Chowdhury, AK Bhunia… - Proceedings of the …, 2022 - openaccess.thecvf.com

We scrutinise an important observation plaguing scene-level sketch research--that a
significant portion of scene sketches are" partial". A quick pilot study reveals:(i) a scene …

被引用次数：28 相关文章所有 8 个版本

[PDF] arxiv.org

Network augmentation for tiny deep learning

H Cai, C Gan, J Lin, S Han - arXiv preprint arXiv:2110.08890, 2021 - arxiv.org

We introduce Network Augmentation (NetAug), a new training method for improving the
performance of tiny neural networks. Existing regularization techniques (eg, data …

被引用次数：35 相关文章所有 5 个版本

[PDF] arxiv.org

Stackrec: Efficient training of very deep sequential recommender models by iterative stacking

J Wang, F Yuan, J Chen, Q Wu, M Yang, Y Sun… - Proceedings of the 44th …, 2021 - dl.acm.org

Deep learning has brought great progress for the sequential recommendation (SR) tasks.
With advanced network architectures, sequential recommender models can be stacked with …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Elephant neural networks: Born to be a continual learner

Q Lan, AR Mahmood - arXiv preprint arXiv:2310.01365, 2023 - arxiv.org

Catastrophic forgetting remains a significant challenge to continual learning for decades.
While recent works have proposed effective methods to mitigate this problem, they mainly …

被引用次数：3 相关文章所有 4 个版本

[PDF] neurips.cc

Greedy optimization provably wins the lottery: Logarithmic number of winning tickets is enough

M Ye, L Wu, Q Liu - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Despite the great success of deep learning, recent works show that large deep neural
networks are often highly redundant and can be significantly reduced in size. However, the …

被引用次数：14 相关文章所有 7 个版本

高级搜索

QQ 群