J Liu,
Z Bai, Y Zhang, C Zhang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Typically, training LLMs with long context sizes is computationally expensive, requiring
extensive training hours and GPU resources. Existing long-context extension methods …