SemEval-2012 task 7: Choice of plausible alternatives: An evaluation of commonsense causal reasoning

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

被引用次数：228 相关文章所有 6 个版本

[PDF] neurips.cc

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc

We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

被引用次数：428 相关文章所有 7 个版本

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：250 相关文章所有 7 个版本

[PDF] arxiv.org

Rethinking the role of demonstrations: What makes in-context learning work?

S Min, X Lyu, A Holtzman, M Artetxe, M Lewis… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LMs) are able to in-context learn--perform a new task via inference
alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …

被引用次数：1265 相关文章所有 6 个版本

[PDF] arxiv.org

Metaicl: Learning to learn in context

S Min, M Lewis, L Zettlemoyer, H Hajishirzi - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training
framework for few-shot learning where a pretrained language model is tuned to do in …

被引用次数：439 相关文章所有 5 个版本

[PDF] arxiv.org

Language models are multilingual chain-of-thought reasoners

F Shi, M Suzgun, M Freitag, X Wang, S Srivats… - arXiv preprint arXiv …, 2022 - arxiv.org

We evaluate the reasoning abilities of large language models in multilingual settings. We
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …

被引用次数：251 相关文章所有 3 个版本

[PDF] arxiv.org

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive
adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining …

被引用次数：234 相关文章所有 6 个版本

[PDF] arxiv.org

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org

When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

被引用次数：974 相关文章所有 5 个版本

[PDF] aclanthology.org

IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages

D Kakwani, A Kunchukuttan, S Golla… - Findings of the …, 2020 - aclanthology.org

In this paper, we introduce NLP resources for 11 major Indian languages from two major
language families. These resources include:(a) large-scale sentence-level monolingual …

被引用次数：501 相关文章所有 4 个版本

[PDF] neurips.cc

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023 - proceedings.neurips.cc

Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …

被引用次数：69 相关文章所有 4 个版本

高级搜索

QQ 群