A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization

Y Wang, Z Yu, Z Zeng, L Yang, C Wang, H Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Instruction tuning large language models (LLMs) remains a challenging task, owing to the
complexity of hyperparameter selection and the difficulty involved in evaluating the tuned …

Understanding and mitigating the label noise in pre-training on downstream tasks

H Chen, J Wang, A Shah, R Tao, H Wei, X Xie… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-training on large-scale datasets and then fine-tuning on downstream tasks have
become a standard practice in deep learning. However, pre-training data often contain label …

A hard-to-beat baseline for training-free clip-based adaptation

Z Wang, J Liang, L Sheng, R He, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable
zero-shot capacity. Recent research has focused on developing efficient fine-tuning …

Parameter-efficient long-tailed recognition

JX Shi, T Wei, Z Zhou, XY Han, JJ Shao… - arXiv preprint arXiv …, 2023 - arxiv.org
The" pre-training and fine-tuning" paradigm in addressing long-tailed recognition tasks has
sparked significant interest since the emergence of large vision-language models like the …

Novelqa: A benchmark for long-range novel question answering

C Wang, R Ning, B Pan, T Wu, Q Guo, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in
natural language processing, particularly in understanding and processing long-context …

Learning with noisy foundation models

H Chen, J Wang, Z Wang, R Tao, H Wei, X Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation models are usually pre-trained on large-scale datasets and then adapted to
downstream tasks through tuning. However, the large-scale pre-training datasets, often …

ZooPFL: Exploring black-box foundation models for personalized federated learning

W Lu, H Yu, J Wang, D Teney, H Wang, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
When personalized federated learning (FL) meets large foundation models, new challenges
arise from various limitations in resources. In addition to typical limitations such as data …

CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios

Z Zeng, Y Wang, R Xie, W Ye, S Zhang - arXiv preprint arXiv:2403.19287, 2024 - arxiv.org
In the evolving landscape of large language models (LLMs) tailored for software
engineering, the need for benchmarks that accurately reflect real-world development …

Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts

JX Shi, T Wei, Z Zhou, JJ Shao, XY Han… - Forty-first International …, 2024 - openreview.net
The fine-tuning paradigm in addressing long-tail learning tasks has sparked significant
interest since the emergence of foundation models. Nonetheless, how fine-tuning impacts …