BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

文章

学术资源搜索

获得 4 条结果（用时0.03秒）

我的图书馆

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

在引用文章中搜索

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

CNNSum: Exploring Long-Conext Summarization with Large Language Models in Chinese Novels

L Wei, H Yan, X Lu, J Zhu, J Wang, W Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have been well-researched in many long-context tasks.
However, due to high annotation costs, high-quality long-context summary datasets for …

The CAP Principle for LLM Serving

P Zeng, Z Ning, J Zhao, W Cui, M Xu, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

We survey the large language model (LLM) serving area to understand the intricate
dynamics between cost-efficiency and accuracy, which is magnified by the growing need for …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

AttentionX: Exploiting Consensus Discrepancy In Attention from A Distributed Optimization Perspective

G Zhang, R Heusdens - arXiv preprint arXiv:2409.04275, 2024 - arxiv.org

In this paper, we extend the standard Attention in transformer by exploiting the consensus
discrepancy from a distributed optimization perspective, referred to as AttentionX. It is noted …

高级搜索

QQ 群

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Efficient training of large language models on distributed infrastructures: A survey

CNNSum: Exploring Long-Conext Summarization with Large Language Models in Chinese Novels

The CAP Principle for LLM Serving

AttentionX: Exploiting Consensus Discrepancy In Attention from A Distributed Optimization Perspective

引用