The ASSIN 2 shared task: a quick overview

F Souza, R Nogueira, R Lotufo - … 2020, Rio Grande, Brazil, October 20–23 …, 2020 - Springer

Recent advances in language representation using neural networks have made it viable to
transfer the learned internal states of large pretrained language models (LMs) to …

被引用次数：523 相关文章所有 2 个版本

[PDF] arxiv.org

SemEval-2022 task 2: Multilingual idiomaticity detection and sentence embedding

HT Madabushi, E Gow-Smith, M Garcia… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence
Embedding, which consists of two subtasks:(a) a binary classification task aimed at …

被引用次数：39 相关文章所有 7 个版本

[PDF] arxiv.org

AStitchInLanguageModels: Dataset and methods for the exploration of idiomaticity in pre-trained language models

HT Madabushi, E Gow-Smith, C Scarton… - arXiv preprint arXiv …, 2021 - arxiv.org

Despite their success in a variety of NLP tasks, pre-trained language models, due to their
heavy reliance on compositionality, fail in effectively capturing the meanings of multiword …

被引用次数：38 相关文章所有 8 个版本

[PDF] arxiv.org

Sabiá: Portuguese large language models

R Pires, H Abonizio, TS Almeida… - Brazilian Conference on …, 2023 - Springer

As the capabilities of language models continue to advance, it is conceivable that “one-size-
fits-all” model will remain as the main paradigm. For instance, given the vast number of …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

Ptt5: Pretraining and validating the t5 model on brazilian portuguese data

D Carmo, M Piau, I Campiotti, R Nogueira… - arXiv preprint arXiv …, 2020 - arxiv.org

In natural language processing (NLP), there is a need for more resources in Portuguese,
since much of the data used in the state-of-the-art research is in other languages. In this …

被引用次数：47 相关文章所有 3 个版本

[PDF] arxiv.org

Advancing neural encoding of portuguese with transformer albertina pt

J Rodrigues, L Gomes, J Silva, A Branco… - EPIA Conference on …, 2023 - Springer

To advance the neural encoding of Portuguese (PT), and a fortiori the technological
preparation of this language for the digital age, we developed a Transformer-based …

被引用次数：34 相关文章所有 5 个版本

[PDF] arxiv.org

ZeroBERTo: Leveraging zero-shot text classification by topic modeling

A Alcoforado, TP Ferraz, R Gerber, E Bustos… - … Processing of the …, 2022 - Springer

Traditional text classification approaches often require a good amount of labeled data, which
is difficult to obtain, especially in restricted domains or less widespread languages. This lack …

被引用次数：25 相关文章所有 10 个版本

[PDF] arxiv.org

Cabrita: closing the gap for foreign languages

C Larcher, M Piau, P Finardi, P Gengo… - arXiv preprint arXiv …, 2023 - arxiv.org

The strategy of training the model from scratch in a specific language or domain serves two
essential purposes: i) enhancing performance in the particular linguistic or domain context …

被引用次数：10 相关文章所有 2 个版本

ArEntail: manually-curated Arabic natural language inference dataset from news headlines

R Obeidat, Y Al-Harahsheh, M Al-Ayyoub… - Language Resources …, 2024 - Springer

Natural language inference (NLI), also known as textual entailment recognition (TER), is a
crucial task in natural language processing that combines many fundamental aspects of …

被引用次数：1 相关文章

BERT models for Brazilian Portuguese: Pretraining, evaluation and tokenization analysis

FC Souza, RF Nogueira, RA Lotufo - Applied Soft Computing, 2023 - Elsevier

Recent advances in language representation using neural networks have made it viable to
transfer the learned internal states of large pretrained language models (LMs) to …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群