Hire a linguist!: Learning endangered languages in LLMs with in-context linguistic descriptions

K Zhang, Y Choi, Z Song, T He… - Findings of the …, 2024 - aclanthology.org
How can large language models (LLMs) process and translate endangered languages?
Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely …

Towards multilingual interlinear morphological glossing

S Okabe, F Yvon - 2023 Conference on Empirical Methods in Natural …, 2023 - hal.science
Interlinear Morphological Glosses are annotations produced in the context of language
documentation. Their goal is to identify morphs occurring in an L1 sentence and to explicit …

[PDF][PDF] SigMoreFun submission to the SIGMORPHON shared task on interlinear glossing

T He, L Tjuatja, N Robinson, S Watanabe… - Proceedings of the 20th …, 2023 - par.nsf.gov
In our submission to the SIGMORPHON 2023 Shared Task on interlinear glossing (IGT), we
explore approaches to data augmentation and modeling across seven low-resource …

Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions

K Zhang, YM Choi, Z Song, T He, WY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
How can large language models (LLMs) process and translate endangered languages?
Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely …

Glossy bytes: Neural glossing using subword encoding

Z Cross, M Yun, A Apparaju, J MacCabe… - Proceedings of the …, 2023 - aclanthology.org
This paper presents several different neural subword modelling based approaches to
interlinear glossing for seven under-resourced languages as a part of the 2023 …

Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation

B Shandilya, A Palmer - arXiv preprint arXiv:2410.00387, 2024 - arxiv.org
The data and compute requirements of current language modeling technology pose
challenges for the processing and analysis of low-resource languages. Declarative linguistic …

Robust generalization strategies for morpheme glossing in an endangered language documentation context

M Ginn, A Palmer - arXiv preprint arXiv:2311.02777, 2023 - arxiv.org
Generalization is of particular importance in resource-constrained settings, where the
available training data may represent only a small fraction of the distribution of possible …

Bootstrapping UMR Annotations for Arapaho from Language Documentation Resources

MJ Buchholz, J Bonn, CB Post, A Cowell… - Proceedings of the …, 2024 - aclanthology.org
Abstract Uniform Meaning Representation (UMR) is a semantic labeling system in the AMR
family designed to be uniformly applicable to typologically diverse languages. The UMR …

Wav2Gloss: Generating Interlinear Glossed Text from Speech

T He, K Choi, L Tjuatja, NR Robinson, J Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
Thousands of the world's languages are in danger of extinction--a tremendous threat to
cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of …

GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning

R Ramos, EA Chimoto, M ter Hoeve… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce GrammaMT, a grammatically-aware prompting approach for machine
translation that uses Interlinear Glossed Text (IGT), a common form of linguistic description …