Sparse Mixture of Experts Language Models Excel in Knowledge Distillation

H Xu, H Liu, W Gong, X Deng, H Wang - CCF International Conference on …, 2024 - Springer
Abstract Knowledge distillation is an effective method for reducing the computational
overhead of large language models. However, recent optimization efforts in distilling large …

Sparse Mixture of Experts Language Models Excel in Knowledge Distillation

H Xu, H Liu, W Gong, X Deng, H Wang - CCF International Conference …, 2024 - dl.acm.org
Abstract Knowledge distillation is an effective method for reducing the computational
overhead of large language models. However, recent optimization efforts in distilling large …