Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models

J Ni, GH Abrego, N Constant, J Ma, KB Hall… - arXiv preprint arXiv …, 2021 - arxiv.org
We provide the first exploration of sentence embeddings from text-to-text transformers (T5).
Sentence embeddings are broadly useful for language processing tasks. While T5 achieves …

Compacter: Efficient low-rank hypercomplex adapter layers

R Karimi Mahabadi, J Henderson… - Advances in Neural …, 2021 - proceedings.neurips.cc
Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the
standard method for achieving state-of-the-art performance on NLP benchmarks. However …

mT5: A massively multilingual pre-trained text-to-text transformer

L Xue, N Constant, A Roberts, M Kale… - arXiv preprint arXiv …, 2020 - arxiv.org
The recent" Text-to-Text Transfer Transformer"(T5) leveraged a unified text-to-text format and
scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this …

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain

M Freitag, R Rei, N Mathur, C Lo… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT21 News Translation …

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Semeval-2022 task 11: Multilingual complex named entity recognition (multiconer)

S Malmasi, A Fang, B Fetahu, S Kar… - Proceedings of the …, 2022 - aclanthology.org
We present the findings of SemEval-2022 Task 11 on Multilingual Complex Named Entity
Recognition MULTICONER. Divided into 13 tracks, the task focused on methods to identify …

Semeval-2022 task 6: isarcasmeval, intended sarcasm detection in english and arabic

IA Farha, S Oprea, S Wilson… - The 16th International …, 2022 - research.ed.ac.uk
Abstract iSarcasmEval is the first shared task to target intended sarcasm detection: the data
for this task was provided and labelled by the authors of the texts themselves. Such an …

Lifting the curse of multilinguality by pre-training modular transformers

J Pfeiffer, N Goyal, XV Lin, X Li, J Cross… - arXiv preprint arXiv …, 2022 - arxiv.org
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which
causes per-language performance to drop as they cover more languages. We address this …

Charformer: Fast character transformers via gradient-based subword tokenization

Y Tay, VQ Tran, S Ruder, J Gupta, HW Chung… - arXiv preprint arXiv …, 2021 - arxiv.org
State-of-the-art models in natural language processing rely on separate rigid subword
tokenization algorithms, which limit their generalization ability and adaptation to new …