作者
Soto Montalvo, Raquel Martínez, Arantza Casillas, Víctor Fresno
发表日期
2006/7
研讨会论文
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
页码范围
1145-1152
简介
This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends on the possibility of identifying cognate named entities between the languages used in the corpus. An additional advantage of the approach is that it does not need any information about the right number of clusters; the algorithm calculates it. We have tested this approach with a comparable corpus of news written in English and Spanish. In addition, we have compared the results with a system which translates selected document features. The obtained results are encouraging.
引用总数
200620072008200920102011201220132014201520162017201820192020121223252412212
学术搜索中的文章
S Montalvo, R Martínez, A Casillas, V Fresno - Proceedings of the 21st International Conference on …, 2006