Scaling laws of rope-based extrapolation

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2407 相关文章所有 4 个版本

[PDF] arxiv.org

Fit: Flexible vision transformer for diffusion model

Z Lu, Z Wang, D Huang, C Wu, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Nature is infinitely resolution-free. In the context of this reality, existing diffusion models, such
as Diffusion Transformers, often face challenges when processing image resolutions outside …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

T Ju, Y Wang, X Ma, P Cheng, H Zhao, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted
their impressive capabilities in various applications, such as collaborative problem-solving …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

H Que, J Liu, G Zhang, C Zhang, X Qu, Y Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to
expand the model's fundamental understanding of specific downstream domains (eg, math …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Z He, G Feng, S Luo, K Yang, D He, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we leverage the intrinsic segmentation of language sequences and design a
new positional encoding method called Bilevel Positional Encoding (BiPE). For each …

被引用次数：1 相关文章所有 3 个版本

[PDF] aclanthology.org

Context Length Extension via Generalized Extrapolation Scale

L Li, Z Huaping - Findings of the Association for Computational …, 2024 - aclanthology.org

Context length expansion of transformer models is considered a key challenge, especially
when handling context beyond the training length during inference stage. In this paper, we …

[PDF] arxiv.org

Institutional Platform for Secure Self-Service Large Language Model Exploration

VK Bumgardner, MA Klusty, WV Logan… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper introduces a user-friendly platform developed by the University of Kentucky
Center for Applied AI, designed to make large, customized language models (LLMs) more …

高级搜索

QQ 群