On-device language models: A comprehensive review

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …

[PDF][PDF] Gpt-neo with lora for better medical knowledge performance on multimedqa dataset

J Blanco, C Lambert, O Thompson - 2024 - osf.io
Abstract The integration of Low-Rank Adaptation (LoRA) with the GPT-Neo model
significantly enhances its performance in medical knowledge tasks by leveraging the …

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services

Z Yang, Y Yang, C Zhao, Q Guo, W He, W Ji - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid growth in the number of large language model (LLM) users, it is difficult for
bandwidth-constrained cloud servers to simultaneously process massive LLM services in …

Understanding GPU Architecture Implications on LLM Serving Workloads

Z Zhang - 2024 - research-collection.ethz.ch
Large Language Models (LLM) has become a promising piece of new technology. However,
the power of LLMs can only be unleashed with a lot of computation. In this work, we firstly …