BERTabaporu: assessing a genre-specific language model for Portuguese NLP

PB Costa, MC Pavan, WR Santos… - Proceedings of the …, 2023 - aclanthology.org
Transformer-based language models such as Bidirectional Encoder Representations from
Transformers (BERT) are now mainstream in the NLP field, but extensions to languages …

Hints on the data for language modeling of synthetic languages with transformers

R Zevallos, N Bel - Proceedings of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Abstract Language Models (LM) are becoming more and more useful for providing
representations upon which to train Natural Language Processing applications. However …

A Survey of Large Language Models for European Languages

W Ali, S Pyysalo - arXiv preprint arXiv:2408.15040, 2024 - arxiv.org
Large Language Models (LLMs) have gained significant attention due to their high
performance on a wide range of natural language tasks since the release of ChatGPT. The …

Emerging roots: Investigating early access to meaning in Maltese auditory word recognition

J Nieder, R van de Vijver, A Ussishkin - Cognitive Science, 2024 - Wiley Online Library
In Semitic languages, the consonantal root is central to morphology, linking form and
meaning. While psycholinguistic studies highlight its importance in language processing, the …

Disentangling singlish discourse particles with task-driven representation

LTE Foo, LHX Ng - Proceedings of the 6th ACM International Conference …, 2024 - dl.acm.org
Singlish, or formally Colloquial Singapore English, is an English-based creole language
originating from the SouthEast Asian country Singapore. The language contains influences …

Tokenisation in machine translation does matter: The impact of different tokenisation approaches for Maltese

K Abela, K Micallef, M Tanti, C Borg - Proceedings of the The …, 2024 - aclanthology.org
Abstract In Machine Translation, various tokenisers are used to segment inputs before
training a model. Despite tokenisation being mostly considered a solved problem for …

Evaluating Language Model Vulnerability to Poisoning Attacks in Low-Resource Settings

R Plant, MV Giuffrida, N Pitropakis… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Pre-trained language models are a highly effective source of knowledge transfer for natural
language processing tasks, as their development represents an investment of resources …

UOM-Constrained IWSLT 2024 Shared Task Submission-Maltese Speech Translation

K Abela, MAR Riyadh, M Galea, A Busuttil… - Proceedings of the …, 2024 - aclanthology.org
This paper presents our IWSLT-2024 shared task submission on the low-resource track. This
submission forms part of the constrained setup; implying limited data for training. Following …

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

A Alajrami, K Margatina, N Aletras - arXiv preprint arXiv:2310.17271, 2023 - arxiv.org
Understanding how and what pre-trained language models (PLMs) learn about language is
an open challenge in natural language processing. Previous work has focused on …

Exploring the impact of transliteration on NLP performance: Treating Maltese as an Arabic dialect

Multilingual models such as mBERT have been demonstrated to exhibit impressive
crosslingual transfer for a number of languages. Despite this, the performance drops for …