Why skip if you can combine: A simple knowledge distillation technique for intermediate layers

Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation

M Valipour, M Rezagholizadeh, I Kobyzev… - arXiv preprint arXiv …, 2022 - arxiv.org

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more
expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main …

被引用次数：156 相关文章所有 4 个版本

[PDF] aaai.org

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

被引用次数：128 相关文章所有 4 个版本

[PDF] arxiv.org

Applications of knowledge distillation in remote sensing: A survey

Y Himeur, N Aburaed, O Elharrouss, I Varlamis… - Information …, 2024 - Elsevier

With the ever-growing complexity of models in the field of remote sensing (RS), there is an
increasing demand for solutions that balance model accuracy with computational efficiency …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

A comprehensive survey of compression algorithms for language models

S Park, J Choi, S Lee, U Kang - arXiv preprint arXiv:2401.15347, 2024 - arxiv.org

How can we compress language models without sacrificing accuracy? The number of
compression algorithms for language models is rapidly growing to benefit from remarkable …

被引用次数：12 相关文章所有 2 个版本

[PDF] aclanthology.org

Continual learning with semi-supervised contrastive distillation for incremental neural machine translation

Y Liang, F Meng, J Wang, J Xu, Y Chen… - Proceedings of the …, 2024 - aclanthology.org

Incrementally expanding the capability of an existing translation model to solve new domain
tasks over time is a fundamental and practical problem, which usually suffers from …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Selective knowledge distillation for neural machine translation

F Wang, J Yan, F Meng, J Zhou - arXiv preprint arXiv:2105.12967, 2021 - arxiv.org

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many
translation benchmarks. As an active research field in NMT, knowledge distillation is widely …

被引用次数：58 相关文章所有 7 个版本

[PDF] arxiv.org

SMaLL-100: Introducing shallow multilingual machine translation model for low-resource languages

A Mohammadshahi, V Nikoulina, A Berard… - arXiv preprint arXiv …, 2022 - arxiv.org

In recent years, multilingual machine translation models have achieved promising
performance on low-resource language pairs by sharing information between similar …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Deep versus wide: An analysis of student architectures for task-agnostic knowledge distillation of self-supervised speech models

T Ashihara, T Moriya, K Matsuura, T Tanaka - arXiv preprint arXiv …, 2022 - arxiv.org

Self-supervised learning (SSL) is seen as a very promising approach with high performance
for several speech downstream tasks. Since the parameters of SSL models are generally so …

被引用次数：30 相关文章所有 5 个版本

[PDF] aclanthology.org

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org

Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

被引用次数：28 相关文章所有 2 个版本

Graph structure aware contrastive knowledge distillation for incremental learning in recommender systems

Y Wang, Y Zhang, M Coates - Proceedings of the 30th ACM International …, 2021 - dl.acm.org

Personalized recommender systems are playing an increasingly important role for online
services. Graph Neural Network (GNN) based recommender models have demonstrated a …

被引用次数：24 相关文章

高级搜索

QQ 群