languages is finding relevant training data for the statistical language models. Large amount
of data is required, because models should estimate the probability for all possible word
sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem
with the data is the huge amount of different word forms that are common in normal speech.
The same problem exists also in other language technology applications such as machine …