Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

CNNSum: Exploring Long-Conext Summarization with Large Language Models in Chinese Novels

L Wei, H Yan, X Lu, J Zhu, J Wang, W Zhang - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have been well-researched in many long-context tasks.
However, due to high annotation costs, high-quality long-context summary datasets for …

The CAP Principle for LLM Serving

P Zeng, Z Ning, J Zhao, W Cui, M Xu, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
We survey the large language model (LLM) serving area to understand the intricate
dynamics between cost-efficiency and accuracy, which is magnified by the growing need for …

AttentionX: Exploiting Consensus Discrepancy In Attention from A Distributed Optimization Perspective

G Zhang, R Heusdens - arXiv preprint arXiv:2409.04275, 2024 - arxiv.org
In this paper, we extend the standard Attention in transformer by exploiting the consensus
discrepancy from a distributed optimization perspective, referred to as AttentionX. It is noted …