H Xu, H Liu, W Gong, X Deng, H Wang - CCF International Conference …, 2024 - dl.acm.org
Abstract Knowledge distillation is an effective method for reducing the computational
overhead of large language models. However, recent optimization efforts in distilling large …