In this paper we evaluate the influence of different document representations in the results of multilingual news clustering. We aim at proving whether or not the use of only named …
In this article, we present a new algorithm for clustering a bilingual collection of comparable news items in groups of specific topics. Our hypothesis is that named entities (NE s) are …
Clustering is a central technique in NLP. Consequently, clustering evaluation is of great importance. Many clustering algorithms are evaluated by their success in tagging corpus …
We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare …
This paper presents an approach for Multilingual News Document Clustering in comparable corpora. We have implemented two algorithms of heuristic nature that follow the approach …