Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?

E Razumovskaia, I Vulić, A Korhonen - arXiv preprint arXiv:2403.01929, 2024 - arxiv.org
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning
(ICL) are three alternative, de facto standard approaches to few-shot learning. ICL has …

Language and task arithmetic with parameter-efficient layers for zero-shot summarization

A Chronopoulou, J Pfeiffer, J Maynez, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the
performance of large language models (LLMs) on the downstream task. However, there are …

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Y Li, S Han, S Ji - arXiv preprint arXiv:2405.15179, 2024 - arxiv.org
As the adoption of large language models increases and the need for per-user or per-task
model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low …

Leveraging open knowledge for advancing task expertise in large language models

Y Yang, Y Qin, T Wu, Z Xu, G Li, P Guo, H Shao… - arXiv preprint arXiv …, 2024 - arxiv.org
The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas
often requires special-purpose tuning with calibrated behaviors on the expected stable …

Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?

N Asadi, M Beitollahi, Y Khalil, Y Li, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large
language and vision models on downstream tasks. Specifically, the efficiency of low-rank …

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

JY Choi, JR Park, I Park, J Cho, A No… - arXiv preprint arXiv …, 2024 - arxiv.org
Current state-of-the-art diffusion models employ U-Net architectures containing
convolutional and (qkv) self-attention layers. The U-Net processes images while being …

Mixture of Experts Using Tensor Products

Z Su, F Mo, P Tiwari, B Wang, JY Nie… - arXiv preprint arXiv …, 2024 - arxiv.org
In multi-task learning, the conventional approach involves training a model on multiple tasks
simultaneously. However, the training signals from different tasks can interfere with one …

AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery

Y Qu, Y Tang, C Zhang, W Zhang - arXiv preprint arXiv:2410.21705, 2024 - arxiv.org
Different from the traditional semi-supervised learning paradigm that is constrained by the
close-world assumption, Generalized Category Discovery (GCD) presumes that the …

Glider: Global and Local Instruction-Driven Expert Router

P Li, P Yadav, J Yoon, J Peng, YL Sung… - arXiv preprint arXiv …, 2024 - arxiv.org
The availability of performant pre-trained models has led to a proliferation of fine-tuned
expert models that are specialized to particular domains. This has enabled the creation of …

Red Teaming for Multimodal Large Language Models: A Survey

M Mahato, A Kumar, K Singh, B Kukreja, J Nabi - Authorea Preprints, 2024 - techrxiv.org
As Generative AI becomes more prevalent, the vulnerability to security threats grows. This
study conducts a thorough exploration of red teaming methods within the domain of …