Owq: Outlier-aware weight quantization for efficient fine-tuning and inference of large language...

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu

Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

被引用次数：214 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of low-bit large language models: Basics, systems, and algorithms

R Gong, Y Ding, Z Wang, C Lv, X Zheng, J Du… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have achieved remarkable advancements in natural
language processing, showcasing exceptional performance across various tasks. However …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

被引用次数：2 相关文章所有 2 个版本

[PDF] ieee.org

Survey of different large language model architectures: Trends, benchmarks, and challenges

M Shao, A Basit, R Karri, M Shafique - IEEE Access, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) represent a class of deep learning models adept at
understanding natural language and generating coherent text in response to prompts or …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Vptq: Extreme low-bit vector post-training quantization for large language models

Y Liu, J Wen, Y Wang, S Ye, LL Zhang, T Cao… - arXiv preprint arXiv …, 2024 - arxiv.org

Scaling model size significantly challenges the deployment and inference of Large
Language Models (LLMs). Due to the redundancy in LLM weights, recent research has …

被引用次数：2 相关文章所有 3 个版本

[PDF] techrxiv.org

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org

To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

被引用次数：62 相关文章所有 3 个版本

[PDF] arxiv.org

Abq-llm: Arbitrary-bit quantized inference acceleration for large language models

C Zeng, S Liu, Y Xie, H Liu, X Wang, M Wei… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have revolutionized natural language processing tasks.
However, their practical application is constrained by substantial memory and computational …

被引用次数：2 相关文章所有 2 个版本

HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models

X Shen, Z Han, L Lu, Z Kong, P Dong… - … on Computer-Aided …, 2024 - ieeexplore.ieee.org

The Large Language Models (LLMs) have been popular and widely used in creative ways
because of their powerful capabilities. However, the substantial model size and complexity …

被引用次数：2 相关文章

[PDF] arxiv.org

QEFT: Quantization for Efficient Fine-Tuning of LLMs

C Lee, J Jin, Y Cho, E Park - arXiv preprint arXiv:2410.08661, 2024 - arxiv.org

With the rapid growth in the use of fine-tuning for large language models (LLMs), optimizing
fine-tuning while keeping inference efficient has become highly important. However, this is a …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Impact of ML optimization tactics on greener pre-trained ML models

AG Álvarez, J Castaño, X Franch… - arXiv preprint arXiv …, 2024 - arxiv.org

Background: Given the fast-paced nature of today's technology, which has surpassed
human performance in tasks like image classification, visual reasoning, and English …

被引用次数：2 相关文章

高级搜索

QQ 群