Text stemming: Approaches, applications, and challenges

J Singh, V Gupta - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Stemming is a process in which the variant word forms are mapped to their base form. It is
among the basic text pre-processing approaches used in Language Modeling, Natural …

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arXiv preprint arXiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

Morfessor 2.0: Toolkit for statistical morphological segmentation

P Smit, S Virpioja, SA Grönroos… - The 14th Conference of …, 2014 - aaltodoc.aalto.fi
Morfessor is a family of probabilistic machine learning methods forfinding the morphological
segmentation from raw text data. Recentdevelopments include the development of semi …

Non-Māori-speaking New Zealanders have a Māori proto-lexicon

Y Oh, S Todd, C Beckner, J Hay, J King, J Needle - Scientific reports, 2020 - nature.com
We investigate implicit vocabulary learning by adults who are exposed to a language in their
ambient environment. Most New Zealanders do not speak Māori, yet are exposed to it …

An unsupervised method for uncovering morphological chains

K Narasimhan, R Barzilay, T Jaakkola - Transactions of the …, 2015 - direct.mit.edu
Most state-of-the-art systems today produce morphological analysis based only on
orthographic patterns. In contrast, we propose a model for unsupervised morphological …

Computational learning of morphology

JA Goldsmith, JL Lee, A Xanthos - Annual Review of Linguistics, 2017 - annualreviews.org
This article reviews research on the unsupervised learning of morphology, that is, the
induction of morphological knowledge with no prior knowledge of the language beyond the …

[PDF][PDF] Supervised morphological segmentation in a low-resource learning setting using conditional random fields

T Ruokolainen, O Kohonen, S Virpioja… - Proceedings of the …, 2013 - aclanthology.org
We discuss data-driven morphological segmentation, in which word forms are segmented
into morphs, the surface forms of morphemes. Our focus is on a lowresource learning …

MorphAGram, evaluation and framework for unsupervised morphological segmentation

R Eskander, F Callejas, E Nichols… - Proceedings of the …, 2020 - aclanthology.org
Computational morphological segmentation has been an active research topic for decades
as it is beneficial for many natural language processing tasks. With the high cost of manually …

A comparative study of minimally supervised morphological segmentation

T Ruokolainen, O Kohonen, K Sirts… - Computational …, 2016 - direct.mit.edu
This article presents a comparative study of a subfield of morphology learning referred to as
minimally supervised morphological segmentation. In morphological segmentation, word …

Probabilistic finite-state morphological segmenter for wixarika (huichol) language

M Mager, D Carrillo, I Meza - Journal of Intelligent & Fuzzy …, 2018 - content.iospress.com
In this work, we present a morphological segmenter for the Mexican indigenous language
Wixarika. Segmentation is fundamental for rich morphological languages, a common aspect …