On-device language models: A comprehensive review

J Xu, Z Li, W Chen, Q Wang, X Gao, Q Cai… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Retrieval-augmented mixture of lora experts for uploadable machine learning

Z Zhao, L Gan, G Wang, Y Hu, T Shen, H Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models
(LLMs). Its modular and plug-and-play nature allows the integration of various domain …

Smile: Zero-shot sparse mixture of low-rank experts construction from pre-trained foundation models

A Tang, L Shen, Y Luo, S Xie, H Hu, L Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep model training on extensive datasets is increasingly becoming cost-prohibitive,
prompting the widespread adoption of deep model fusion techniques to leverage knowledge …

Low-Rank Adaptation for Foundation Models: A Comprehensive Review

M Yang, J Chen, Y Zhang, J Liu, J Zhang, Q Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of foundation modelslarge-scale neural networks trained on
diverse, extensive datasetshas revolutionized artificial intelligence, enabling unprecedented …

Leveraging open knowledge for advancing task expertise in large language models

Y Yang, Y Qin, T Wu, Z Xu, G Li, P Guo, H Shao… - arXiv preprint arXiv …, 2024 - arxiv.org
The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas
often requires special-purpose tuning with calibrated behaviors on the expected stable …

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

W Zheng, Y Chen, W Zhang, S Kundu, Y Li… - … Models: Evolving AI …, 2024 - openreview.net
Large language models (LLMs) have achieved remarkable success in natural language
processing tasks but suffer from high computational costs during inference, limiting their …

MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts

L Ning, H Lara, M Guo, A Rastogi - arXiv preprint arXiv:2408.01505, 2024 - arxiv.org
Parameter-efficient fine-tuning techniques like Low-Rank Adaptation (LoRA) have
revolutionized the adaptation of large language models (LLMs) to diverse tasks. Recent …

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts

N Gritsch, Q Zhang, A Locatelli, S Hooker… - arXiv preprint arXiv …, 2024 - arxiv.org
Efficiency, specialization, and adaptability to new data distributions are qualities that are
hard to combine in current Large Language Models. The Mixture of Experts (MoE) …

Instant Transformer Adaption via HyperLoRA

R Charakorn, E Cetin, Y Tang… - … Models: Evolving AI for …, 2024 - openreview.net
While Foundation Models provide a general tool for rapid content creation, they regularly
require task-specific adaptation. Traditionally, this exercise involves careful curation of …