Corpus and lexicon-mutual incompletness- 学术资源搜索

[PDF][PDF] Corpus and lexicon-mutual incompletness

C Krstev, D Vitas - Proceedings of the Corpus Linguistics …, 2005 - researchgate.net

Proceedings of the Corpus Linguistics Conference, 2005•researchgate.net

The natural language processing group (NLP group) at the Faculty of Mathematics, University of Belgrade is engaged for many years now in a task of producing various language resources, both corpora and lexicons (Vitas et al. 2003). However, in the past our main goal was to produce as many resources as possible in order to try to keep the pace with the so called “big” languages. After producing resources of considerable size we focused our attention to the evaluation of their quality. In order to support this process we performed an experiment by applying the Serbian morphological dictionary to the corpus in order to establish: a) The extent and content of the corpus lexica that is not covered by e-dictionary. Here we are trying to see what kind of tools have to be developed for the recognition and tagging of unrecognized words such as derivatives, proper names, acronyms, foreign words, etc. b) The part of e-dictionary not covered by the lexica found in the corpus. We are looking for uncovered lemmas (for instance, to what extent corpus covers the names of zoological species), and uncovered forms (for instance, is imperfect tense really vanishing from contemporary Serbian), etc.

In section 2 we will discuss the structure of Serbian monolingual corpus, its size and accessibility of its part that is presented on web, in the section 3 we will present our Serbian morphological e-dictionary. In section 4 we will present the results of the analysis of the coverage of the corpus by the e-dictionary, while in section 5 we will analyse the coverage of e-dictionary in corpus. Finally, in section 6 we will give some concluding remarks, mainly concerning our future work on the further development of both the corpus and the e-dictionary on the basis of the results presented in this paper.

researchgate.net

展开收起

被引用次数：29 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

[PDF][PDF] Corpus and lexicon-mutual incompletness

引用