Abstract Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during …
With the ever-growing complexity of models in the field of remote sensing (RS), there is an increasing demand for solutions that balance model accuracy with computational efficiency …
How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable …
Incrementally expanding the capability of an existing translation model to solve new domain tasks over time is a fundamental and practical problem, which usually suffers from …
F Wang, J Yan, F Meng, J Zhou - arXiv preprint arXiv:2105.12967, 2021 - arxiv.org
Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely …
In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar …
Self-supervised learning (SSL) is seen as a very promising approach with high performance for several speech downstream tasks. Since the parameters of SSL models are generally so …
Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD). However, this technique applies matching in the hidden spaces of two …
Y Wang, Y Zhang, M Coates - Proceedings of the 30th ACM International …, 2021 - dl.acm.org
Personalized recommender systems are playing an increasingly important role for online services. Graph Neural Network (GNN) based recommender models have demonstrated a …