Multilingual constituency parsing with self-attention and pre-training

Y Wang, Y Kordi, S Mishra, A Liu, NA Smith… - arXiv preprint arXiv …, 2022 - arxiv.org

Large" instruction-tuned" language models (ie, finetuned to respond to instructions) have
demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they …

被引用次数：1809 相关文章所有 8 个版本

[PDF] neurips.cc

Pre-trained language models for interactive decision-making

S Li, X Puig, C Paxton, Y Du, C Wang… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Language model (LM) pre-training is useful in many language processing tasks.
But can pre-trained LMs be further leveraged for more general machine learning problems …

被引用次数：211 相关文章所有 8 个版本

[PDF] arxiv.org

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arXiv preprint arXiv …, 2021 - arxiv.org

A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

被引用次数：263 相关文章所有 5 个版本

[PDF] neurips.cc

Superglue: A stickier benchmark for general-purpose language understanding systems

A Wang, Y Pruksachatkun, N Nangia… - Advances in neural …, 2019 - proceedings.neurips.cc

In the last year, new models and methods for pretraining and transfer learning have driven
striking performance improvements across a range of language understanding tasks. The …

被引用次数：2434 相关文章所有 10 个版本

[PDF] arxiv.org

Flaubert: Unsupervised language model pre-training for french

H Le, L Vial, J Frej, V Segonne, M Coavoux… - arXiv preprint arXiv …, 2019 - arxiv.org

Language models have become a key step to achieve state-of-the art results in many
different Natural Language Processing (NLP) tasks. Leveraging the huge amount of …

被引用次数：559 相关文章所有 8 个版本

[PDF] oup.com

Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models

M Jeong, J Sohn, M Sung, J Kang - Bioinformatics, 2024 - academic.oup.com

Recent proprietary large language models (LLMs), such as GPT-4, have achieved a
milestone in tackling diverse challenges in the biomedical domain, ranging from multiple …

被引用次数：42 相关文章所有 2 个版本

[PDF] neurips.cc

Skill-it! a data-driven skills framework for understanding and training language models

M Chen, N Roberts, K Bhatia, J Wang… - Advances in …, 2024 - proceedings.neurips.cc

The quality of training data impacts the performance of pre-trained large language models
(LMs). Given a fixed budget of tokens, we study how to best select data that leads to good …

被引用次数：44 相关文章所有 8 个版本

[PDF] arxiv.org

IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP

F Koto, A Rahimi, JH Lau, T Baldwin - arXiv preprint arXiv:2011.00677, 2020 - arxiv.org

Although the Indonesian language is spoken by almost 200 million people and the 10th
most spoken language in the world, it is under-represented in NLP research. Previous work …

被引用次数：266 相关文章所有 5 个版本

[PDF] arxiv.org

75 languages, 1 model: Parsing universal dependencies universally

D Kondratyuk, M Straka - arXiv preprint arXiv:1904.02099, 2019 - arxiv.org

We present UDify, a multilingual multi-task model capable of accurately predicting universal
part-of-speech, morphological features, lemmas, and dependency trees simultaneously for …

被引用次数：306 相关文章所有 9 个版本

[PDF] arxiv.org

The cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning

S Kim, SJ Joo, D Kim, J Jang, S Ye, J Shin… - arXiv preprint arXiv …, 2023 - arxiv.org

Language models (LMs) with less than 100B parameters are known to perform poorly on
chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this …

被引用次数：80 相关文章所有 4 个版本

高级搜索

QQ 群