Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arXiv preprint arXiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

H Pouransari, CL Li, JHR Chang, PKA Vasu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are commonly trained on datasets consisting of fixed-length
token sequences. These datasets are created by randomly concatenating documents of …

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

K Lu, Z Liang, X Nie, D Pan, S Zhang, K Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The effectiveness of long-context modeling is important for Large Language Models (LLMs)
in various applications. Despite their potential, LLMs' efficacy in processing long context …

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Y Zhou, M Keuper, M Fritz - arXiv preprint arXiv:2408.13586, 2024 - arxiv.org
Sampling-based decoding strategies have been widely adopted for Large Language
Models (LLMs) in numerous applications, which target a balance between diversity and …

Bucket Pre-training is All You Need

H Liu, Q Peng, Q Yang, K Liu, H Xu - arXiv preprint arXiv:2407.07495, 2024 - arxiv.org
Large language models (LLMs) have demonstrated exceptional performance across various
natural language processing tasks. However, the conventional fixed-length data …