Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

Billm: Pushing the limit of post-training quantization for llms

W Huang, Y Liu, H Qin, Y Li, S Zhang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Pretrained large language models (LLMs) exhibit exceptional general language processing
capabilities but come with significant demands on memory and computational resources. As …

How good are low-bit quantized llama3 models? an empirical study

W Huang, X Ma, H Qin, X Zheng, C Lv, H Chen… - arXiv e …, 2024 - ui.adsabs.harvard.edu
Meta's LLaMA family has become one of the most powerful open-source Large Language
Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve …

Wkvquant: Quantizing weight and key/value cache for large language models gains more

Y Yue, Z Yuan, H Duanmu, S Zhou, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) face significant deployment challenges due to their
substantial memory requirements and the computational demands of auto-regressive text …

Llava-prumerge: Adaptive token reduction for efficient large multimodal models

Y Shang, M Cai, B Xu, YJ Lee, Y Yan - arXiv preprint arXiv:2403.15388, 2024 - arxiv.org
Large Multimodal Models (LMMs) have shown significant reasoning capabilities by
connecting a visual encoder and a large language model. LMMs typically use a fixed …

Aptq: Attention-aware post-training mixed-precision quantization for large language models

Z Guan, H Huang, Y Su, H Huang, N Wong… - Proceedings of the 61st …, 2024 - dl.acm.org
Large Language Models (LLMs) have greatly advanced the natural language processing
paradigm. However, the high computational load and huge model sizes pose a grand …

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu… - arXiv preprint arXiv …, 2024 - arxiv.org
The upscaling of Large Language Models (LLMs) has yielded impressive advances in
natural language processing, yet it also poses significant deployment challenges. Weight …

Rolora: Fine-tuning rotated outlier-free llms for effective weight-activation quantization

X Huang, Z Liu, SY Liu, KT Cheng - arXiv preprint arXiv:2407.08044, 2024 - arxiv.org
Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)
method, significantly enhances the training efficiency by updating only a small portion of the …