Self-instruct: Aligning language models with self-generated instructions

Y Wang, Y Kordi, S Mishra, A Liu, NA Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
Large" instruction-tuned" language models (ie, finetuned to respond to instructions) have
demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they …

Pre-trained language models for interactive decision-making

S Li, X Puig, C Paxton, Y Du, C Wang… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Language model (LM) pre-training is useful in many language processing tasks.
But can pre-trained LMs be further leveraged for more general machine learning problems …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arXiv preprint arXiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc
In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

Flaubert: Unsupervised language model pre-training for french

H Le, L Vial, J Frej, V Segonne, M Coavoux… - arXiv preprint arXiv …, 2019 - arxiv.org
Language models have become a key step to achieve state-of-the art results in many
different Natural Language Processing (NLP) tasks. Leveraging the huge amount of …

Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models

M Jeong, J Sohn, M Sung, J Kang - Bioinformatics, 2024 - academic.oup.com
Recent proprietary large language models (LLMs), such as GPT-4, have achieved a
milestone in tackling diverse challenges in the biomedical domain, ranging from multiple …

Skill-it! a data-driven skills framework for understanding and training language models

M Chen, N Roberts, K Bhatia, J Wang… - Advances in …, 2024 - proceedings.neurips.cc
The quality of training data impacts the performance of pre-trained large language models
(LMs). Given a fixed budget of tokens, we study how to best select data that leads to good …

IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP

F Koto, A Rahimi, JH Lau, T Baldwin - arXiv preprint arXiv:2011.00677, 2020 - arxiv.org
Although the Indonesian language is spoken by almost 200 million people and the 10th
most spoken language in the world, it is under-represented in NLP research. Previous work …

75 languages, 1 model: Parsing universal dependencies universally

D Kondratyuk, M Straka - arXiv preprint arXiv:1904.02099, 2019 - arxiv.org
We present UDify, a multilingual multi-task model capable of accurately predicting universal
part-of-speech, morphological features, lemmas, and dependency trees simultaneously for …

The cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning

S Kim, SJ Joo, D Kim, J Jang, S Ye, J Shin… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) with less than 100B parameters are known to perform poorly on
chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this …