{SparTA}:{Deep-Learning} Model Sparsity via {Tensor-with-Sparsity-Attribute}

Y Song, Z Mi, H Xie, H Chen - Proceedings of the ACM SIGOPS 30th …, 2024 - dl.acm.org

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …

被引用次数：84 相关文章所有 3 个版本

[PDF] arxiv.org

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demands low latency …

被引用次数：79 相关文章所有 2 个版本

[PDF] neurips.cc

Museformer: Transformer with fine-and coarse-grained attention for music generation

B Yu, P Lu, R Wang, W Hu, X Tan… - Advances in …, 2022 - proceedings.neurips.cc

Symbolic music generation aims to generate music scores automatically. A recent trend is to
use Transformer or its variants in music generation, which is, however, suboptimal, because …

被引用次数：68 相关文章所有 5 个版本

[PDF] acm.org

Sparsetir: Composable abstractions for sparse compilation in deep learning

Z Ye, R Lai, J Shao, T Chen, L Ceze - Proceedings of the 28th ACM …, 2023 - dl.acm.org

Sparse tensors are rapidly becoming critical components of modern deep learning
workloads. However, developing high-performance sparse operators can be difficult and …

被引用次数：82 相关文章所有 4 个版本

[PDF] usenix.org

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：7 相关文章

[PDF] arxiv.org

Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity

H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu… - arXiv preprint arXiv …, 2023 - arxiv.org

With the fast growth of parameter size, it becomes increasingly challenging to deploy large
generative models as they typically require large GPU memory consumption and massive …

被引用次数：47 相关文章所有 5 个版本

[PDF] usenix.org

Optimizing dynamic neural networks with brainstorm

W Cui, Z Han, L Ouyang, Y Wang, N Zheng… - … USENIX Symposium on …, 2023 - usenix.org

Dynamic neural networks (NNs), which can adapt sparsely activated sub-networks to inputs
during inference, have shown significant advantages over static ones in terms of accuracy …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Pit: Optimization of dynamic sparse deep learning models via permutation invariant transformation

N Zheng, H Jiang, Q Zhang, Z Han, L Ma… - Proceedings of the 29th …, 2023 - dl.acm.org

Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant
challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are …

被引用次数：22 相关文章所有 3 个版本

[PDF] acm.org

Register Tiling for Unstructured Sparsity in Neural Network Inference

L Wilkinson, K Cheshmi, MM Dehnavi - Proceedings of the ACM on …, 2023 - dl.acm.org

Unstructured sparse neural networks are an important class of machine learning (ML)
models, as they compact model size and reduce floating point operations. The execution …

被引用次数：16 相关文章所有 3 个版本

[PDF] acm.org

Looplets: A language for structured coiteration

W Ahrens, D Donenfeld, F Kjolstad… - Proceedings of the 21st …, 2023 - dl.acm.org

Real world arrays often contain underlying structure, such as sparsity, runs of repeated
values, or symmetry. Specializing for structure yields significant speedups. But automatically …

被引用次数：18 相关文章所有 9 个版本

高级搜索

QQ 群