BERTimbau: pretrained BERT models for Brazilian Portuguese

F Souza, R Nogueira, R Lotufo - … 2020, Rio Grande, Brazil, October 20–23 …, 2020 - Springer
Recent advances in language representation using neural networks have made it viable to
transfer the learned internal states of large pretrained language models (LMs) to …

SemEval-2022 task 2: Multilingual idiomaticity detection and sentence embedding

HT Madabushi, E Gow-Smith, M Garcia… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence
Embedding, which consists of two subtasks:(a) a binary classification task aimed at …

AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models

HT Madabushi, E Gow-Smith, C Scarton… - arXiv preprint arXiv …, 2021 - arxiv.org
Despite their success in a variety of NLP tasks, pre-trained language models, due to their
heavy reliance on compositionality, fail in effectively capturing the meanings of multiword …

Sabiá: Portuguese large language models

R Pires, H Abonizio, TS Almeida… - Brazilian Conference on …, 2023 - Springer
As the capabilities of language models continue to advance, it is conceivable that “one-size-
fits-all” model will remain as the main paradigm. For instance, given the vast number of …

Ptt5: Pretraining and validating the t5 model on brazilian portuguese data

D Carmo, M Piau, I Campiotti, R Nogueira… - arXiv preprint arXiv …, 2020 - arxiv.org
In natural language processing (NLP), there is a need for more resources in Portuguese,
since much of the data used in the state-of-the-art research is in other languages. In this …

Advancing neural encoding of portuguese with transformer albertina pt

J Rodrigues, L Gomes, J Silva, A Branco… - EPIA Conference on …, 2023 - Springer
To advance the neural encoding of Portuguese (PT), and a fortiori the technological
preparation of this language for the digital age, we developed a Transformer-based …

ZeroBERTo: Leveraging zero-shot text classification by topic modeling

A Alcoforado, TP Ferraz, R Gerber, E Bustos… - … Processing of the …, 2022 - Springer
Traditional text classification approaches often require a good amount of labeled data, which
is difficult to obtain, especially in restricted domains or less widespread languages. This lack …

Cabrita: closing the gap for foreign languages

C Larcher, M Piau, P Finardi, P Gengo… - arXiv preprint arXiv …, 2023 - arxiv.org
The strategy of training the model from scratch in a specific language or domain serves two
essential purposes: i) enhancing performance in the particular linguistic or domain context …

ArEntail: manually-curated Arabic natural language inference dataset from news headlines

R Obeidat, Y Al-Harahsheh, M Al-Ayyoub… - Language Resources …, 2024 - Springer
Natural language inference (NLI), also known as textual entailment recognition (TER), is a
crucial task in natural language processing that combines many fundamental aspects of …

BERT models for Brazilian Portuguese: Pretraining, evaluation and tokenization analysis

FC Souza, RF Nogueira, RA Lotufo - Applied Soft Computing, 2023 - Elsevier
Recent advances in language representation using neural networks have made it viable to
transfer the learned internal states of large pretrained language models (LMs) to …