Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have …
X Liu, L Li, C Li, A Yao - arXiv preprint arXiv:2305.13803, 2023 - arxiv.org
Existing feature distillation methods commonly adopt the One-to-one Representation Matching between any pre-selected teacher-student layer pair. In this paper, we present N …
Network quantization has gained increasing attention with the rapid growth of large pre- trained language models~(PLMs). However, most existing quantization methods for PLMs …
Multiplication (eg, convolution) is arguably a cornerstone of modern deep neural networks (DNNs). However, intensive multiplications cause expensive resource costs that challenge …
We scrutinise an important observation plaguing scene-level sketch research--that a significant portion of scene sketches are" partial". A quick pilot study reveals:(i) a scene …
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (eg, data …
J Wang, F Yuan, J Chen, Q Wu, M Yang, Y Sun… - Proceedings of the 44th …, 2021 - dl.acm.org
Deep learning has brought great progress for the sequential recommendation (SR) tasks. With advanced network architectures, sequential recommender models can be stacked with …
Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly …
M Ye, L Wu, Q Liu - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the …