Continual Pre-Training of Large Language Models: How to (re) warm your model?

Z Azerbayev, H Schoelkopf, K Paster… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

被引用次数：238 相关文章所有 7 个版本

[PDF] arxiv.org

Sheared llama: Accelerating language model pre-training via structured pruning

M Xia, T Gao, Z Zeng, D Chen - arXiv preprint arXiv:2310.06694, 2023 - arxiv.org

The popularity of LLaMA (Touvron et al., 2023a; b) and other recently emerged moderate-
sized large language models (LLMs) highlights the potential of building smaller yet powerful …

被引用次数：187 相关文章所有 5 个版本

[PDF] arxiv.org

Llama pro: Progressive llama with block expansion

C Wu, Y Gan, Y Ge, Z Lu, J Wang, Y Feng… - arXiv preprint arXiv …, 2024 - arxiv.org

Humans generally acquire new skills without compromising the old; however, the opposite
holds for Large Language Models (LLMs), eg, from LLaMA to CodeLLaMA. To this end, we …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org

Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of pre-trained language models for processing scientific text

X Ho, AKD Nguyen, AT Dao, J Jiang, Y Chida… - arXiv preprint arXiv …, 2024 - arxiv.org

The number of Language Models (LMs) dedicated to processing scientific text is on the rise.
Keeping pace with the rapid growth of scientific LMs (SciLMs) has become a daunting task …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

J Parmar, S Satheesh, M Patwary, M Shoeybi… - arXiv preprint arXiv …, 2024 - arxiv.org

As language models have scaled both their number of parameters and pretraining dataset
sizes, the computational cost for pretraining has become intractable except for the most well …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Open-finllms: Open multimodal large language models for financial applications

Q Xie, D Li, M Xiao, Z Jiang, R Xiang, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have advanced financial applications, yet they often lack
sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables …

被引用次数：9 相关文章所有 5 个版本

[PDF] aclanthology.org

On the vulnerability of safety alignment in open-access llms

J Yi, R Ye, Q Chen, B Zhu, S Chen, D Lian… - Findings of the …, 2024 - aclanthology.org

Large language models (LLMs) possess immense capabilities but are susceptible to
malicious exploitation. To mitigate the risk, safety alignment is employed to align LLMs with …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Me llama: Foundation large language models for medical applications

Q Xie, Q Chen, A Chen, C Peng, Y Hu, F Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent large language models (LLMs) like ChatGPT and LLaMA have shown great promise
in many AI applications. However, their performance on medical tasks is suboptimal and can …

被引用次数：32 相关文章所有 7 个版本

[PDF] arxiv.org

Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

L Song, E Segal, E Xing - arXiv preprint arXiv:2412.06993, 2024 - arxiv.org

We present an approach of using AI to model and simulate biology and life. Why is it
important? Because at the core of medicine, pharmacy, public health, longevity, agriculture …

被引用次数：6 相关文章所有 4 个版本

高级搜索

QQ 群