Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation

M Valipour, M Rezagholizadeh, I Kobyzev… - arXiv preprint arXiv …, 2022 - arxiv.org
With the ever-growing size of pretrained models (PMs), fine-tuning them has become more
expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main …

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

Applications of knowledge distillation in remote sensing: A survey

Y Himeur, N Aburaed, O Elharrouss, I Varlamis… - Information …, 2024 - Elsevier
With the ever-growing complexity of models in the field of remote sensing (RS), there is an
increasing demand for solutions that balance model accuracy with computational efficiency …

A comprehensive survey of compression algorithms for language models

S Park, J Choi, S Lee, U Kang - arXiv preprint arXiv:2401.15347, 2024 - arxiv.org
How can we compress language models without sacrificing accuracy? The number of
compression algorithms for language models is rapidly growing to benefit from remarkable …

Continual learning with semi-supervised contrastive distillation for incremental neural machine translation

Y Liang, F Meng, J Wang, J Xu, Y Chen… - Proceedings of the …, 2024 - aclanthology.org
Incrementally expanding the capability of an existing translation model to solve new domain
tasks over time is a fundamental and practical problem, which usually suffers from …

Selective knowledge distillation for neural machine translation

F Wang, J Yan, F Meng, J Zhou - arXiv preprint arXiv:2105.12967, 2021 - arxiv.org
Neural Machine Translation (NMT) models achieve state-of-the-art performance on many
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …

SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages

A Mohammadshahi, V Nikoulina, A Berard… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, multilingual machine translation models have achieved promising
performance on low-resource language pairs by sharing information between similar …

Deep versus wide: An analysis of student architectures for task-agnostic knowledge distillation of self-supervised speech models

T Ashihara, T Moriya, K Matsuura, T Tanaka - arXiv preprint arXiv …, 2022 - arxiv.org
Self-supervised learning (SSL) is seen as a very promising approach with high performance
for several speech downstream tasks. Since the parameters of SSL models are generally so …

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org
Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

Graph structure aware contrastive knowledge distillation for incremental learning in recommender systems

Y Wang, Y Zhang, M Coates - Proceedings of the 30th ACM International …, 2021 - dl.acm.org
Personalized recommender systems are playing an increasingly important role for online
services. Graph Neural Network (GNN) based recommender models have demonstrated a …