Energy-efficient and robust cumulative training with net2net transformation

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Energy-efficient and robust cumulative training with net2net transformation

在引用文章中搜索

[PDF] arxiv.org

bert2bert: Towards reusable pretrained language models

C Chen, Y Yin, L Shang, X Jiang, Y Qin, F Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …

被引用次数：54 相关文章所有 6 个版本

[PDF] aaai.org

When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks

H Wu, W Wang, T Malepathirana… - Proceedings of the …, 2024 - ojs.aaai.org

Neural growth is the process of growing a small neural network to a large network and has
been utilized to accelerate the training of deep neural networks. One crucial aspect of neural …

[HTML][HTML] Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

Q Tan, J Zhang - Electronics, 2024 - mdpi.com

In the realm of pretrained language models (PLMs), the exponential increase in
computational resources and time required for training as model sizes expand presents a …

高级搜索

QQ 群

Energy-efficient and robust cumulative training with net2net transformation

bert2bert: Towards reusable pretrained language models

When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks

[HTML][HTML] Leveraging Neighbor Attention Initialization (NAI) for Efficient Training of Pretrained LLMs

引用