Autoformer: Searching transformers for visual recognition

M Chen, H Peng, J Fu, H Ling - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Recently, pure transformer-based models have shown great potentials for vision tasks such
as image classification and detection. However, the design of transformer networks is …

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Norm: Knowledge distillation via n-to-one representation matching

X Liu, L Li, C Li, A Yao - arXiv preprint arXiv:2305.13803, 2023 - arxiv.org
Existing feature distillation methods commonly adopt the One-to-one Representation
Matching between any pre-selected teacher-student layer pair. In this paper, we present N …

Towards efficient post-training quantization of pre-trained language models

H Bai, L Hou, L Shang, X Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc
Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …

Shiftaddnet: A hardware-inspired deep network

H You, X Chen, Y Zhang, C Li, S Li… - Advances in …, 2020 - proceedings.neurips.cc
Multiplication (eg, convolution) is arguably a cornerstone of modern deep neural networks
(DNNs). However, intensive multiplications cause expensive resource costs that challenge …

Partially does it: Towards scene-level fg-sbir with partial input

PN Chowdhury, AK Bhunia… - Proceedings of the …, 2022 - openaccess.thecvf.com
We scrutinise an important observation plaguing scene-level sketch research--that a
significant portion of scene sketches are" partial". A quick pilot study reveals:(i) a scene …

Network augmentation for tiny deep learning

H Cai, C Gan, J Lin, S Han - arXiv preprint arXiv:2110.08890, 2021 - arxiv.org
We introduce Network Augmentation (NetAug), a new training method for improving the
performance of tiny neural networks. Existing regularization techniques (eg, data …

Stackrec: Efficient training of very deep sequential recommender models by iterative stacking

J Wang, F Yuan, J Chen, Q Wu, M Yang, Y Sun… - Proceedings of the 44th …, 2021 - dl.acm.org
Deep learning has brought great progress for the sequential recommendation (SR) tasks.
With advanced network architectures, sequential recommender models can be stacked with …

Elephant neural networks: Born to be a continual learner

Q Lan, AR Mahmood - arXiv preprint arXiv:2310.01365, 2023 - arxiv.org
Catastrophic forgetting remains a significant challenge to continual learning for decades.
While recent works have proposed effective methods to mitigate this problem, they mainly …

Greedy optimization provably wins the lottery: Logarithmic number of winning tickets is enough

M Ye, L Wu, Q Liu - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Despite the great success of deep learning, recent works show that large deep neural
networks are often highly redundant and can be significantly reduced in size. However, the …