A survey on data selection for language models

A Albalak, Y Elazar, SM Xie, S Longpre… - arXiv preprint arXiv …, 2024 - arxiv.org
A major factor in the recent success of large language models is the use of enormous and
ever-growing text datasets for unsupervised pre-training. However, naively training a model …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

The responsible foundation model development cheatsheet: A review of tools & resources

S Longpre, S Biderman, A Albalak… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation model development attracts a rapidly expanding body of contributors, scientists,
and applications. To help shape responsible development practices, we introduce the …

[PDF][PDF] Attentional mixtures of soft prompt tuning for parameter-efficient multi-task knowledge sharing

A Asai, M Salehi, ME Peters… - arXiv preprint arXiv …, 2022 - homes.cs.washington.edu
This work introduces ATTEMPT (ATTEntional Mixture of Prompt Tuning), a new modular,
multi-task, and parameterefficient language model (LM) tuning approach that combines …

Improving few-shot generalization by exploring and exploiting auxiliary data

A Albalak, CA Raffel, WY Wang - Advances in Neural …, 2024 - proceedings.neurips.cc
Few-shot learning is valuable in many real-world applications, but learning a generalizable
model without overfitting to the few labeled datapoints is challenging. In this work, we focus …

Lower bounds on the expressivity of recurrent neural language models

A Svete, F Nowak, AM Sahabdeen… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent successes and spread of large neural language models (LMs) call for a thorough
understanding of their computational ability. Describing their computational abilities through …

Leveraging machine-generated rationales to facilitate social meaning detection in conversations

R Dutt, Z Wu, K Shi, D Sheth, P Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
We present a generalizable classification approach that leverages Large Language Models
(LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We …

Taskweb: Selecting better source tasks for multi-task nlp

J Kim, A Asai, G Ilharco, H Hajishirzi - arXiv preprint arXiv:2305.13256, 2023 - arxiv.org
Recent work in NLP has shown promising results in training models on large amounts of
tasks to achieve better generalization. However, it is not well-understood how tasks are …

D-REX: Dialogue relation extraction with explanations

A Albalak, V Embar, YL Tuan, L Getoor… - arXiv preprint arXiv …, 2021 - arxiv.org
Existing research studies on cross-sentence relation extraction in long-form multi-party
conversations aim to improve relation extraction without considering the explainability of …

Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study

M Reuver, S Verberne, A Fokkens - arXiv preprint arXiv:2404.03987, 2024 - arxiv.org
For a viewpoint-diverse news recommender, identifying whether two news articles express
the same viewpoint is essential. One way to determine" same or different" viewpoint is …