Current datasets for long-form video understanding often fall short of providing genuine long- form comprehension challenges, as many tasks derived from these datasets can be …
P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable …
Large language models (LLMs) have a surprising failure: when trained on" A has a feature B", they do not generalize to" B is a feature of A", which is termed the Reversal Curse. Even …
X Chan, X Wang, D Yu, H Mi, D Yu - arXiv preprint arXiv:2406.20094, 2024 - arxiv.org
We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully …
M Hirano, K Imajo - arXiv preprint arXiv:2404.10555, 2024 - arxiv.org
Large language models (LLMs) are now widely used in various fields, including finance. However, Japanese financial-specific LLMs have not been proposed yet. Hence, this study …
Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable …
Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. To make the CPT approach more traceable, this paper presents …
To ensure performance on a diverse set of downstream tasks, LLMs are pretrained via data mixtures over different domains. In this work, we demonstrate that the optimal data …
Z Yu, S Das, C Xiong - arXiv preprint arXiv:2406.06046, 2024 - arxiv.org
Pretraining data selection has the potential to improve language model pretraining efficiency by utilizing higher-quality data from massive web data corpora. Current data selection …