Rethinking embedding coupling in pre-trained language models

J Ni, GH Abrego, N Constant, J Ma, KB Hall… - arXiv preprint arXiv …, 2021 - arxiv.org

We provide the first exploration of sentence embeddings from text-to-text transformers (T5).
Sentence embeddings are broadly useful for language processing tasks. While T5 achieves …

被引用次数：338 相关文章所有 4 个版本

[PDF] neurips.cc

Compacter: Efficient low-rank hypercomplex adapter layers

R Karimi Mahabadi, J Henderson… - Advances in Neural …, 2021 - proceedings.neurips.cc

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the
standard method for achieving state-of-the-art performance on NLP benchmarks. However …

被引用次数：343 相关文章所有 9 个版本

[PDF] arxiv.org

mT5: A massively multilingual pre-trained text-to-text transformer

L Xue, N Constant, A Roberts, M Kale… - arXiv preprint arXiv …, 2020 - arxiv.org

The recent" Text-to-Text Transfer Transformer"(T5) leveraged a unified text-to-text format and
scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this …

被引用次数：2022 相关文章所有 7 个版本

[PDF] aclanthology.org

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

被引用次数：142 相关文章

[PDF] aclanthology.org

Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain

M Freitag, R Rei, N Mathur, C Lo… - Proceedings of the …, 2021 - aclanthology.org

This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT21 News Translation …

被引用次数：149 相关文章所有 8 个版本

[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

被引用次数：60 相关文章所有 7 个版本

[PDF] aclanthology.org

Semeval-2022 task 11: Multilingual complex named entity recognition (multiconer)

S Malmasi, A Fang, B Fetahu, S Kar… - Proceedings of the …, 2022 - aclanthology.org

We present the findings of SemEval-2022 Task 11 on Multilingual Complex Named Entity
Recognition MULTICONER. Divided into 13 tracks, the task focused on methods to identify …

被引用次数：101 相关文章所有 3 个版本

[PDF] ed.ac.uk

Semeval-2022 task 6: isarcasmeval, intended sarcasm detection in english and arabic

IA Farha, S Oprea, S Wilson… - The 16th International …, 2022 - research.ed.ac.uk

Abstract iSarcasmEval is the first shared task to target intended sarcasm detection: the data
for this task was provided and labelled by the authors of the texts themselves. Such an …

被引用次数：79 相关文章所有 5 个版本

[PDF] arxiv.org

Lifting the curse of multilinguality by pre-training modular transformers

J Pfeiffer, N Goyal, XV Lin, X Li, J Cross… - arXiv preprint arXiv …, 2022 - arxiv.org

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which
causes per-language performance to drop as they cover more languages. We address this …

被引用次数：93 相关文章所有 5 个版本

[PDF] arxiv.org

Charformer: Fast character transformers via gradient-based subword tokenization

Y Tay, VQ Tran, S Ruder, J Gupta, HW Chung… - arXiv preprint arXiv …, 2021 - arxiv.org

State-of-the-art models in natural language processing rely on separate rigid subword
tokenization algorithms, which limit their generalization ability and adaptation to new …

被引用次数：125 相关文章所有 5 个版本

高级搜索

QQ 群