alleviates the small size of available NER training corpora for German with distributional
generalization features trained on large unlabelled corpora. We vary the size and source of
the generalization corpus and find improvements of 6% F1 score (in-domain) and 9%(out-of-
domain) over simple supervised training.