[PDF][PDF] UralicNLP: An NLP library for Uralic languages

M Hämäläinen - Journal of open source software, 2019 - joss.theoj.org
In the past years the natural language processing (NLP) tools and resources for the small
Uralic languages have received a major uplift. The open-source infrastructure by Giellatekno …

An unsupervised method for OCR post-correction and spelling normalisation for Finnish

Q Duong, M Hämäläinen, S Hengchen - arXiv preprint arXiv:2011.03502, 2020 - arxiv.org
Historical corpora are known to contain errors introduced by OCR (optical character
recognition) methods used in the digitization process, often said to be degrading the …

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

K Alnajjar, M Hämäläinen, J Rueter… - arXiv preprint arXiv …, 2020 - arxiv.org
We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to
re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur …

[PDF][PDF] Open-source morphology for endangered mordvinic languages

J Rueter, M Hämäläinen… - Workshop for NLP Open …, 2020 - researchportal.helsinki.fi
This document describes shared development of finite-state description of two closely
related but endangered minority languages, Erzya and Moksha. It touches upon …

Fst morphology for the endangered skolt sami language

J Rueter, M Hämäläinen - arXiv preprint arXiv:2004.04803, 2020 - arxiv.org
We present advances in the development of a FST-based morphological analyzer and
generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich …

Finding Sami cognates with a character-based NMT approach

M Hämäläinen, J Reuter - Proceedings of the Workshop …, 2019 - journals.colorado.edu
We approach the problem of expanding the set of cognate relations with a sequence-to-
sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too …

[PDF][PDF] On editing dictionaries for uralic languages in an online environment

K Alnajjar, M Hämäläinen, J Rueter - Proceedings of the Sixth …, 2020 - aclanthology.org
We present an open online infrastructure for editing and visualization of dictionaries of
different Uralic languages (eg Erzya, Moksha, Skolt Sami and Komi-Zyrian). Our …

[PDF][PDF] Extracting a semantic database with syntactic relations for finnish to boost resources for endangered uralic languages

M Hämäläinen - The Proceedings of Logic and Engineering of …, 2018 - helda.helsinki.fi
This paper introduces the second version of SemFi, a semantic database for Finnish with
syntactic relations. The previous version of SemFi has been used in poem generation, and …

Working Towards Digital Documentation of Uralic Languages With Open-Source Tools and Modern NLP Methods

M Hämäläinen, J Rueter, K Alnajjar… - Proceedings of the Big …, 2023 - aclanthology.org
We present our work towards building an infrastructure for documenting endangered
languages with the focus on Uralic languages in particular. Our infrastructure consists of …

DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT

M Hämäläinen - arXiv preprint arXiv:2411.01531, 2024 - arxiv.org
We showcase that ChatGPT can be used to disambiguate lemmas in two endangered
languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our …