Fewer truncations improve language modeling

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arXiv preprint arXiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

被引用次数：65 相关文章所有 3 个版本

[PDF] arxiv.org

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

H Pouransari, CL Li, JHR Chang, PKA Vasu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are commonly trained on datasets consisting of fixed-length
token sequences. These datasets are created by randomly concatenating documents of …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

K Lu, Z Liang, X Nie, D Pan, S Zhang, K Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

The effectiveness of long-context modeling is important for Large Language Models (LLMs)
in various applications. Despite their potential, LLMs' efficacy in processing long context …

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Y Zhou, M Keuper, M Fritz - arXiv preprint arXiv:2408.13586, 2024 - arxiv.org

Sampling-based decoding strategies have been widely adopted for Large Language
Models (LLMs) in numerous applications, which target a balance between diversity and …

[PDF] arxiv.org

Bucket Pre-training is All You Need

H Liu, Q Peng, Q Yang, K Liu, H Xu - arXiv preprint arXiv:2407.07495, 2024 - arxiv.org

Large language models (LLMs) have demonstrated exceptional performance across various
natural language processing tasks. However, the conventional fixed-length data …

高级搜索

QQ 群