Towards efficient pre-trained language model via feature correlation distillation

K Huang, X Guo, M Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Abstract Knowledge Distillation (KD) has emerged as a promising approach for compressing
large Pre-trained Language Models (PLMs). The performance of KD relies on how to …

Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation

K Huang, X Guo, M Wang - Thirty-seventh Conference on Neural … - openreview.net
Knowledge Distillation (KD) has emerged as a promising approach for compressing large
Pre-trained Language Models (PLMs). The performance of KD relies on how to effectively …

[PDF][PDF] Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation

K Huang, X Guo, M Wang - papers.neurips.cc
Abstract Knowledge Distillation (KD) has emerged as a promising approach for compressing
large Pre-trained Language Models (PLMs). The performance of KD relies on how to …

Towards efficient pre-trained language model via feature correlation distillation

K Huang, X Guo, M Wang - … of the 37th International Conference on …, 2023 - dl.acm.org
Knowledge Distillation (KD) has emerged as a promising approach for compressing large
Pre-trained Language Models (PLMs). The performance of KD relies on how to effectively …