查看文章

Multilingual document clustering: an heuristic approach based on cognate named entities

作者

Soto Montalvo, Raquel Martínez, Arantza Casillas, Víctor Fresno

发表日期

2006/7

研讨会论文

Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

页码范围

1145-1152

简介

This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends on the possibility of identifying cognate named entities between the languages used in the corpus. An additional advantage of the approach is that it does not need any information about the right number of clusters; the algorithm calculates it. We have tested this approach with a comparable corpus of news written in English and Spanish. In addition, we have compared the results with a system which translates selected document features. The obtained results are encouraging.

引用总数

被引用次数：32

2006200720082009201020112012201320142015201620172018201920201 2 1 2 2 3 2 5 2 4 1 2 2 1 2

学术搜索中的文章

Multilingual document clustering: an heuristic approach based on cognate named entities

S Montalvo, R Martínez, A Casillas, V Fresno - Proceedings of the 21st International Conference on …, 2006

被引用次数：32 相关文章所有 7 个版本