Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in …
Large language models (LLMs) exhibit varying strengths and weaknesses across different tasks, prompting recent studies to explore the benefits of ensembling models to leverage …
Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch …
R Ye, Y Xiao, B Hui - arXiv preprint arXiv:2410.12621, 2024 - arxiv.org
As large language models (LLMs) continue to advance, ensuring their alignment with human values becomes increasingly critical. Traditional alignment methods heavily rely on …
U Muneeb, MI Ohannessian - The Thirty-eighth Annual Conference on … - openreview.net
We consider scenarios where a very accurate (often small) predictive model using restricted features is available when training a full-featured (often larger) model. This restricted model …
This thesis investigates the incorporation of knowledge into generative open-domain dialogue systems, a pivotal challenge in natural language processing characterized by the …