Low-resource active learning of North Sámi morphological segmentation

SA Grönroos, K Jokinen, K Hiovain… - Septentrio …, 2015 - septentrio.uit.no
Septentrio Conference Series, 2015septentrio.uit.no
Many Uralic languages have a rich morphological structure, but lack tools of morphological
analysis needed for efficient language processing. While creating a high-quality
morphological analyzer requires a significant amount of expert labor, data-driven
approaches may provide sufficient quality for many applications. We study how to create a
statistical model for morphological segmentation of North Sámi language with a large
unannotated corpus and a small amount of human-annotated word forms selected using an …
Abstract
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation of North Sámi language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfessor Baseline and FlatCat methods. A er annotating 237 words with our active learning setup, we improve morph boundary recall over 20% with no loss of precision.
septentrio.uit.no
以上显示的是最相近的搜索结果。 查看全部搜索结果