bert2bert: Towards reusable pretrained language models

C Chen, Y Yin, L Shang, X Jiang, Y Qin, F Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …

When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks

H Wu, W Wang, T Malepathirana… - Proceedings of the …, 2024 - ojs.aaai.org
Neural growth is the process of growing a small neural network to a large network and has
been utilized to accelerate the training of deep neural networks. One crucial aspect of neural …

[HTML][HTML] Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

Q Tan, J Zhang - Electronics, 2024 - mdpi.com
In the realm of pretrained language models (PLMs), the exponential increase in
computational resources and time required for training as model sizes expand presents a …