H Fu, S Zhou, Q Yang, J Tang, G Liu, K Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
The pre-training models such as BERT have achieved great results in various natural
language processing problems. However, a large number of parameters need significant …