Tencentpretrain: A scalable and flexible toolkit for pre-training models of different modalities

Z Liu, T Zhong, Y Li, Y Zhang, Y Pan, Z Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

The rise of large language models (LLMs) has marked a pivotal shift in the field of natural
language processing (NLP). LLMs have revolutionized a multitude of domains, and they …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Cif-bench: A chinese instruction-following benchmark for evaluating the generalizability of large language models

Y Li, G Zhang, X Qu, J Li, Z Li, Z Wang, H Li… - arXiv preprint arXiv …, 2024 - arxiv.org

The advancement of large language models (LLMs) has enhanced the ability to generalize
across a wide range of unseen natural language processing (NLP) tasks through instruction …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Nlebench+ norglm: A comprehensive empirical analysis and benchmark dataset for generative language models in norwegian

P Liu, L Zhang, T Farup, EW Lauvrak… - arXiv preprint arXiv …, 2023 - arxiv.org

Norwegian, spoken by only 5 million population, is under-representative within the most
impressive breakthroughs in NLP tasks. To the best of our knowledge, there has not yet …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Persianllama: Towards building first persian large language model

MA Abbasi, A Ghafouri, M Firouzmandi… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the widespread use of the Persian language by millions globally, limited efforts have
been made in natural language processing for this language. The use of large language …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

Y Weng, Z Wang, H Liao, S He, S Liu, K Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

With the burgeoning development in the realm of large language models (LLMs), the
demand for efficient incremental training tailored to specific industries and domains …

被引用次数：3 相关文章所有 2 个版本

[PDF] iospress.nl

Create and find flatness: Building flat training spaces in advance for continual learning

W Shi, Y Chen, Z Zhao, W Lu, K Yan, X Du - ECAI 2023, 2023 - ebooks.iospress.nl

Catastrophic forgetting remains a critical challenge in the field of continual learning, where
neural networks struggle to retain prior knowledge while assimilating new information. Most …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Weight-inherited distillation for task-agnostic bert compression

T Wu, C Hou, S Lao, J Li, N Wong, Z Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-
based methods focus on designing extra alignment losses for the student model to mimic the …

被引用次数：7 相关文章所有 3 个版本

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群