The natural language processing group (NLP group) at the Faculty of Mathematics, University of Belgrade is engaged for many years now in a task of producing various language resources, both corpora and lexicons (Vitas et al. 2003). However, in the past our main goal was to produce as many resources as possible in order to try to keep the pace with the so called “big” languages. After producing resources of considerable size we focused our attention to the evaluation of their quality. In order to support this process we performed an experiment by applying the Serbian morphological dictionary to the corpus in order to establish: a) The extent and content of the corpus lexica that is not covered by e-dictionary. Here we are trying to see what kind of tools have to be developed for the recognition and tagging of unrecognized words such as derivatives, proper names, acronyms, foreign words, etc. b) The part of e-dictionary not covered by the lexica found in the corpus. We are looking for uncovered lemmas (for instance, to what extent corpus covers the names of zoological species), and uncovered forms (for instance, is imperfect tense really vanishing from contemporary Serbian), etc.
In section 2 we will discuss the structure of Serbian monolingual corpus, its size and accessibility of its part that is presented on web, in the section 3 we will present our Serbian morphological e-dictionary. In section 4 we will present the results of the analysis of the coverage of the corpus by the e-dictionary, while in section 5 we will analyse the coverage of e-dictionary in corpus. Finally, in section 6 we will give some concluding remarks, mainly concerning our future work on the further development of both the corpus and the e-dictionary on the basis of the results presented in this paper.