Survey on knowledge distillation for large language models: methods, evaluation, and application

C Yang, Y Zhu, W Lu, Y Wang, Q Chen, C Gao… - ACM Transactions on …, 2024 - dl.acm.org
Large Language Models (LLMs) have showcased exceptional capabilities in various
domains, attracting significant interest from both academia and industry. Despite their …

Speaker voice normalization for end-to-end speech translation

Z Xue, T Shi, X Zhang, D Xiong - Expert Systems with Applications, 2024 - Elsevier
Speaker voices exhibit acoustic variation. Our preliminary experiments reveal that
normalized voice can significantly improve end-to-end speech translation. To mitigate the …

Co-training and Co-distillation for Quality Improvement and Compression of Language Models

H Lee, R Hou, J Kim, D Liang, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge Distillation (KD) compresses computationally expensive pre-trained language
models (PLMs) by transferring their knowledge to smaller models, allowing their use in …

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

Y Yao, H Wu, M Liu, S Luo, X Han, J Liu, Z Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) exhibit varying strengths and weaknesses across different
tasks, prompting recent studies to explore the benefits of ensembling models to leverage …

Step Out and Seek Around: On Warm-Start Training with Incremental Data

M Shen, H Yin, P Molchanov, L Mao… - arXiv preprint arXiv …, 2024 - arxiv.org
Data often arrives in sequence over time in real-world deep learning applications such as
autonomous driving. When new training data is available, training the model from scratch …

Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning

R Ye, Y Xiao, B Hui - arXiv preprint arXiv:2410.12621, 2024 - arxiv.org
As large language models (LLMs) continue to advance, ensuring their alignment with
human values becomes increasingly critical. Traditional alignment methods heavily rely on …

Induced Model Matching: Restricted Models Help Train Full-Featured Models

U Muneeb, MI Ohannessian - The Thirty-eighth Annual Conference on … - openreview.net
We consider scenarios where a very accurate (often small) predictive model using restricted
features is available when training a full-featured (often larger) model. This restricted model …

[PDF][PDF] Utilizing External and Internal Knowledge for Engaging Open-Domain Dialogue Response Generation

R Choudhary - 2023 - waseda.repo.nii.ac.jp
This thesis investigates the incorporation of knowledge into generative open-domain
dialogue systems, a pivotal challenge in natural language processing characterized by the …