Pb-llm: Partially binarized large language models

Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Efficientqat: Efficient quantization-aware training for large language models

M Chen, W Shao, P Xu, J Wang, P Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are crucial in modern natural language processing and
artificial intelligence. However, they face challenges in managing their significant memory …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Billm: Pushing the limit of post-training quantization for llms

W Huang, Y Liu, H Qin, Y Li, S Zhang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Pretrained large language models (LLMs) exhibit exceptional general language processing
capabilities but come with significant demands on memory and computational resources. As …

被引用次数：55 相关文章所有 4 个版本

How good are low-bit quantized llama3 models? an empirical study

W Huang, X Ma, H Qin, X Zheng, C Lv, H Chen… - arXiv e …, 2024 - ui.adsabs.harvard.edu

Meta's LLaMA family has become one of the most powerful open-source Large Language
Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Wkvquant: Quantizing weight and key/value cache for large language models gains more

Y Yue, Z Yuan, H Duanmu, S Zhou, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) face significant deployment challenges due to their
substantial memory requirements and the computational demands of auto-regressive text …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Llava-prumerge: Adaptive token reduction for efficient large multimodal models

Y Shang, M Cai, B Xu, YJ Lee, Y Yan - arXiv preprint arXiv:2403.15388, 2024 - arxiv.org

Large Multimodal Models (LMMs) have shown significant reasoning capabilities by
connecting a visual encoder and a large language model. LMMs typically use a fixed …

被引用次数：43 相关文章所有 2 个版本

[PDF] arxiv.org

Aptq: Attention-aware post-training mixed-precision quantization for large language models

Z Guan, H Huang, Y Su, H Huang, N Wong… - Proceedings of the 61st …, 2024 - dl.acm.org

Large Language Models (LLMs) have greatly advanced the natural language processing
paradigm. However, the high computational load and huge model sizes pose a grand …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation

D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

The upscaling of Large Language Models (LLMs) has yielded impressive advances in
natural language processing, yet it also poses significant deployment challenges. Weight …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Rolora: Fine-tuning rotated outlier-free llms for effective weight-activation quantization

X Huang, Z Liu, SY Liu, KT Cheng - arXiv preprint arXiv:2407.08044, 2024 - arxiv.org

Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)
method, significantly enhances the training efficiency by updating only a small portion of the …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群