Transferring Core Knowledge via Learngenes

Y Xie, F Feng, J Wang, X Geng, Y Rui - arXiv preprint arXiv:2408.07337, 2024 - arxiv.org

Pre-trained models have become the preferred backbone due to the expansion of model
parameters, with techniques like Parameter-Efficient Fine-Tuning (PEFTs) typically fixing the …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Wave: Weight template for adaptive initialization of variable-sized models

F Feng, Y Xie, J Wang, X Geng - arXiv preprint arXiv:2406.17503, 2024 - arxiv.org

The expansion of model parameters underscores the significance of pre-trained models;
however, the constraints encountered during model deployment necessitate models of …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

SY Xia, W Zhu, X Yang, X Geng - arXiv preprint arXiv:2404.16897, 2024 - arxiv.org

In practice, we usually need to build variable-sized models adapting for diverse resource
constraints in different application scenarios, where weight initialization is an important step …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

Y Xie, F Feng, R Shi, J Wang, X Geng - arXiv preprint arXiv:2409.19289, 2024 - arxiv.org

Diffusion models often face slow convergence, and existing efficient training techniques,
such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre …

被引用次数：1 相关文章所有 2 个版本

Model Parameter Prediction Method for Accelerating Distributed DNN Training

W Liu, D Chen, M Tan, K Chen, Y Yin, WL Shang, J Li… - Computer Networks, 2024 - Elsevier

As the size of deep neural network (DNN) models and datasets increases, distributed
training becomes popular to reduce the training time. However, a severe communication …

[PDF] arxiv.org

SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning

W Li, P Fang, H Xue - arXiv preprint arXiv:2412.09073, 2024 - arxiv.org

Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source
domains to unseen target domains, which is crucial for evaluating the generalization and …

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

S Xia, Y Zu, X Yang, X Geng - The Thirty-eighth Annual Conference on … - openreview.net

In practical scenarios, it is necessary to build variable-sized models to accommodate diverse
resource constraints, where weight initialization serves as a crucial step preceding training …

高级搜索

QQ 群