作者
Torsten Zesch, Iryna Gurevych, Max Mühlhäuser
发表日期
2007
期刊
Data Structures for Linguistic Resources and Applications
卷号
197205
简介
We analyze Wikipedia as a lexical semantic resource and compare it with conventional resources, such as dictionaries, thesauri, semantic wordnets, etc. Different parts of Wikipedia reflect different aspects of these resources. We show that Wikipedia contains a vast amount of knowledge about, eg, named entities, domain specific terms, and rare word senses. If Wikipedia is to be used as a lexical semantic resource in large-scale NLP tasks, e cient programmatic access to the knowledge therein is required. We review existing access mechanisms and show that they are limited with respect to performance and the provided access functions. Therefore, we introduce a general purpose, high performance Java-based Wikipedia API that overcomes these limitations. It is available for research purposes at http://www. ukp. tu-darmstadt. de/software/WikipediaAPI.
引用总数
200720082009201020112012201320142015201620172018201920202021202210141918131077410565153
学术搜索中的文章
T Zesch, I Gurevych, M Mühlhäuser - Data Structures for Linguistic Resources and …, 2007