A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arXiv preprint arXiv:2210.10109, 2022 - arxiv.org
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

A survey of deep active learning for foundation models

T Wan, K Xu, T Yu, X Wang, D Feng, B Ding… - Intelligent …, 2023 - spj.science.org
Active learning (AL) is an effective sample selection approach that annotates only a subset
of the training data to address the challenge of data annotation, and deep learning (DL) is …

On the limitations of simulating active learning

K Margatina, N Aletras - arXiv preprint arXiv:2305.13342, 2023 - arxiv.org
Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects
informative unlabeled data for human annotation, aiming to improve over random sampling …

Not all preference pairs are created equal: A recipe for annotation-efficient iterative preference learning

S Yang, L Cui, D Cai, X Huang, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
Iterative preference learning, though yielding superior performances, requires online
annotated preference labels. In this work, we study strategies to select worth-annotating …

Investigating multi-source active learning for natural language inference

A Snijders, D Kiela, K Margatina - arXiv preprint arXiv:2302.06976, 2023 - arxiv.org
In recent years, active learning has been successfully applied to an array of NLP tasks.
However, prior work often assumes that training and test data are drawn from the same …

Active learning for natural language generation

Y Perlitz, A Gera, M Shmueli-Scheuer… - arXiv preprint arXiv …, 2023 - arxiv.org
The field of Natural Language Generation (NLG) suffers from a severe shortage of labeled
data due to the extremely expensive and time-consuming process involved in manual …

Combining self-supervised learning and active learning for disfluency detection

S Wang, Z Wang, W Che, S Zhao, T Liu - Transactions on Asian and …, 2021 - dl.acm.org
Spoken language is fundamentally different from the written language in that it contains
frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency …

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

L Zhang, J Wu, D Zhou, G Xu - arXiv preprint arXiv:2403.01165, 2024 - arxiv.org
Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-
shot learning through prompting methods, supervised training is still necessary for complex …

DeMuX: Data-efficient Multilingual Learning

S Khanuja, S Gowriraj, L Dery, G Neubig - arXiv preprint arXiv:2311.06379, 2023 - arxiv.org
We consider the task of optimally fine-tuning pre-trained multilingual models, given small
amounts of unlabelled target data and an annotation budget. In this paper, we introduce …