Large language models and games: A survey and roadmap

R Gallotta, G Todd, M Zammit, S Earle, A Liapis… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent years have seen an explosive increase in research on large language models
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

Break the sequential dependency of llm inference using lookahead decoding

Y Fu, P Bailis, I Stoica, H Zhang - arXiv preprint arXiv:2402.02057, 2024 - arxiv.org
Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded,
resulting in high latency and significant wastes of the parallel processing power of modern …

A comprehensive survey of large language models and multimodal large language models in medicine

H Xiao, F Zhou, X Liu, T Liu, Z Li, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal
large language models (MLLMs) have garnered significant attention due to their powerful …

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - Proceedings of the 29th …, 2024 - dl.acm.org
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Reducing llm hallucination using knowledge distillation: A case study with mistral large and mmlu benchmark

D McDonald, R Papadopoulos, L Benningfield - Authorea Preprints, 2024 - techrxiv.org
The application of knowledge distillation to reduce hallucination in large language models
represents a novel and significant advancement in enhancing the reliability and accuracy of …

Wkvquant: Quantizing weight and key/value cache for large language models gains more

Y Yue, Z Yuan, H Duanmu, S Zhou, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) face significant deployment challenges due to their
substantial memory requirements and the computational demands of auto-regressive text …

A survey on the memory mechanism of large language model based agents

Z Zhang, X Bo, C Ma, R Li, X Chen, Q Dai, J Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language model (LLM) based agents have recently attracted much attention from the
research and industry communities. Compared with original LLMs, LLM-based agents are …