R Tang,
Y Lu,
J Lin - Proceedings of the 2nd Workshop on Deep …, 2019 - aclanthology.org
Abstract Knowledge distillation can effectively transfer knowledge from BERT, a deep
language representation model, to traditional, shallow word embedding-based neural …