Taxonomic loss for morphological glossing of low-resource languages

M Ginn, A Palmer - arXiv preprint arXiv:2308.15055, 2023 - arxiv.org
Morpheme glossing is a critical task in automated language documentation and can benefit
other downstream applications greatly. While state-of-the-art glossing systems perform very …

Robust generalization strategies for morpheme glossing in an endangered language documentation context

M Ginn, A Palmer - arXiv preprint arXiv:2311.02777, 2023 - arxiv.org
Generalization is of particular importance in resource-constrained settings, where the
available training data may represent only a small fraction of the distribution of possible …

Natural Language Processing RELIES on Linguistics

J Opitz, S Wein, N Schneider - arXiv preprint arXiv:2405.05966, 2024 - arxiv.org
Large Language Models (LLMs) have become capable of generating highly fluent text in
certain languages, without modules specially designed to capture grammar or semantic …

FonMTL: Towards Multitask Learning for the Fon Language

BFP Dossou, I Houndayi, P Zantou… - arXiv preprint arXiv …, 2023 - arxiv.org
The Fon language, spoken by an average 2 million of people, is a truly low-resourced
African language, with a limited online presence, and existing datasets (just to name but a …

Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching

P Rybak - arXiv preprint arXiv:2402.14408, 2024 - arxiv.org
Pre-trained language models have revolutionized the natural language understanding
landscape, most notably BERT (Bidirectional Encoder Representations from Transformers) …

SIGMORPHON 2023 shared task of interlinear glossing: Baseline model

M Ginn - arXiv preprint arXiv:2303.14234, 2023 - arxiv.org
Language documentation is a critical aspect of language preservation, often including the
creation of Interlinear Glossed Text (IGT). Creating IGT is time-consuming and tedious, and …

Security Knowledge-Guided Fuzzing of Deep Learning Libraries

NS Harzevili, MM Mohajer, M Wei, HV Pham… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries.
However, they either perform unguided input generation (eg, not considering the …

Classifier identification in Ancient Egyptian as a low-resource sequence-labelling task

D Nikolaev, J Grotenhuis, H Harel… - arXiv preprint arXiv …, 2024 - arxiv.org
The complex Ancient Egyptian (AE) writing system was characterised by widespread use of
graphemic classifiers (determinatives): silent (unpronounced) hieroglyphic signs clarifying …

Bridging the Gap: Towards Linguistic Resource Development for the Low-Resource Lambani Language

A Dasare, AR Chowdhury, AS Menon, K Anand… - … Conference on Speech …, 2023 - Springer
Abstract Language technology development is crucial for many downstream applications
such as machine translation and language understanding. The lack of linguistic resources …

Low-Resource Monolingual Transformer Language Models

L Gessler - 2023 - search.proquest.com
Since the publication of BERT in 2018, pretrained Transformer language models (TLMs)
have been a foundational requirement for almost all natural language processing systems …