On-device language models: A comprehensive review

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …

Recommendation with generative models

Y Deldjoo, Z He, J McAuley, A Korikov… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

H Ma, J Liu, R Krashinsky - arXiv preprint arXiv:2410.07531, 2024 - arxiv.org
Dropout, a network operator, when enabled is likely to dramatically impact the performance
of Flash-Attention, which in turn increases the end-to-end training time of Large-Language …

[PDF][PDF] LLM Inference Performance on Chiplet-based Architectures and Systems

S Oh, E Qin, Y Yang, M Zhang, R Parihar, A Pandya - Dimension - hotinfra24.github.io
Abstract Large Language Models (LLMs) have become increasingly prevalent, enabling a
wide range of tasks across various platforms, from handheld devices and wearables to large …