- 学术资源搜索

Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?

E Razumovskaia, I Vulić, A Korhonen - arXiv preprint arXiv:2403.01929, 2024 - arxiv.org

Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning
(ICL) are three alternative, de facto standard approaches to few-shot learning. ICL has …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Language and task arithmetic with parameter-efficient layers for zero-shot summarization

A Chronopoulou, J Pfeiffer, J Maynez, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the
performance of large language models (LLMs) on the downstream task. However, there are …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Y Li, S Han, S Ji - arXiv preprint arXiv:2405.15179, 2024 - arxiv.org

As the adoption of large language models increases and the need for per-user or per-task
model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Leveraging open knowledge for advancing task expertise in large language models

Y Yang, Y Qin, T Wu, Z Xu, G Li, P Guo, H Shao… - arXiv preprint arXiv …, 2024 - arxiv.org

The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas
often requires special-purpose tuning with calibrated behaviors on the expected stable …

被引用次数：1 相关文章

[PDF] arxiv.org

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

N Asadi, M Beitollahi, Y Khalil, Y Li, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large
language and vision models on downstream tasks. Specifically, the efficiency of low-rank …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

JY Choi, JR Park, I Park, J Cho, A No… - arXiv preprint arXiv …, 2024 - arxiv.org

Current state-of-the-art diffusion models employ U-Net architectures containing
convolutional and (qkv) self-attention layers. The U-Net processes images while being …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Mixture of Experts Using Tensor Products

Z Su, F Mo, P Tiwari, B Wang, JY Nie… - arXiv preprint arXiv …, 2024 - arxiv.org

In multi-task learning, the conventional approach involves training a model on multiple tasks
simultaneously. However, the training signals from different tasks can interfere with one …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?

Language and task arithmetic with parameter-efficient layers for zero-shot summarization

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Leveraging open knowledge for advancing task expertise in large language models

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Mixture of Experts Using Tensor Products

AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery

Glider: Global and Local Instruction-Driven Expert Router

Red Teaming for Multimodal Large Language Models: A Survey

引用