Metron: Holistic performance evaluation framework for llm inference systems

文章

学术资源搜索

获得 4 条结果（用时0.03秒）

我的图书馆

Metron: Holistic performance evaluation framework for llm inference systems

在引用文章中搜索

[PDF] arxiv.org

On-device language models: A comprehensive review

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - arXiv preprint arXiv …, 2024 - arxiv.org

The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Recommendation with generative models

Y Deldjoo, Z He, J McAuley, A Korikov… - arXiv preprint arXiv …, 2024 - arxiv.org

Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

H Ma, J Liu, R Krashinsky - arXiv preprint arXiv:2410.07531, 2024 - arxiv.org

Dropout, a network operator, when enabled is likely to dramatically impact the performance
of Flash-Attention, which in turn increases the end-to-end training time of Large-Language …

[PDF][PDF] LLM Inference Performance on Chiplet-based Architectures and Systems

S Oh, E Qin, Y Yang, M Zhang, R Parihar, A Pandya - Dimension - hotinfra24.github.io

Abstract Large Language Models (LLMs) have become increasingly prevalent, enabling a
wide range of tasks across various platforms, from handheld devices and wearables to large …

高级搜索

QQ 群

Metron: Holistic performance evaluation framework for llm inference systems

On-device language models: A comprehensive review

Recommendation with generative models

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

[PDF][PDF] LLM Inference Performance on Chiplet-based Architectures and Systems

引用