Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of Natural Language Processing (NLP) tasks, they are computationally …
Since the introduction of the original BERT (ie, BASE BERT), researchers have developed various customized BERT models with improved performance for specific domains and tasks …
We perform knowledge distillation (KD) benchmark from task-specific BERT-base teacher models to various student models: BiLSTM, CNN, BERT-Tiny, BERT-Mini, and BERT-Small …
Due to the advancement of computational hardware and the abundance and variety of data, deep learning has achieved significant success in natural language processing, computer …
Learning performance data (eg, quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data …
Deep learning models have demonstrated their effectiveness in capturing complex relationships between input features and target outputs across many different application …
Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers. However, manually labeling educational content is …
Contextualized entity representations learned by state-of-the-art transformer-based language models (TLMs) like BERT, GPT, T5, etc., leverage the attention mechanism to …
Z Ding, G Jiang, S Zhang, L Guo, W Lin - Proceedings of the AAAI …, 2023 - ojs.aaai.org
In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT- style language model dubbed SKDBERT. In each distillation iteration, SKD samples a …