In recent years, pre-trained transformer-based language models (LM) have become a key resource for implementing most NLP tasks. However, pre-training such models demands …
Orthographic variation is very common in Luxembourgish texts due to the absence of a fully- fledged standard variety. Additionally, developing NLP tools for Luxembourgish is a difficult …
Large language models are very resource intensive, both financially and environmentally, and require an amount of training data which is simply unobtainable for the majority of NLP …
A Plum, C Döhmer, E Milano, AM Lutgen… - arXiv preprint arXiv …, 2024 - arxiv.org
The Universal Dependencies (UD) project has significantly expanded linguistic coverage across 161 languages, yet Luxembourgish, a West Germanic language spoken by …
Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource …
Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the …
Sentence embedding models play a key role in various Natural Language Processing tasks, such as in Topic Modeling, Document Clustering and Recommendation Systems. However …
Training large language models is challenging when data availability is limited, as it is the case for low-resource languages. We investigate different data augmentation techniques for …
F Philippy, S Haddadan, S Guo - arXiv preprint arXiv:2404.03912, 2024 - arxiv.org
In NLP, zero-shot classification (ZSC) is the task of assigning labels to textual data without any labeled examples for the target classes. A common method for ZSC is to fine-tune a …