A multi-document summarization system based on statistics and linguistic treatment

R Ferreira, L de Souza Cabral, F Freitas… - Expert Systems with …, 2014 - Elsevier
R Ferreira, L de Souza Cabral, F Freitas, RD Lins, G de França Silva, SJ Simske, L Favaro
Expert Systems with Applications, 2014Elsevier
The massive quantity of data available today in the Internet has reached such a huge
volume that it has become humanly unfeasible to efficiently sieve useful information from it.
One solution to this problem is offered by using text summarization techniques. Text
summarization, the process of automatically creating a shorter version of one or more text
documents, is an important way of finding relevant information in large text libraries or in the
Internet. This paper presents a multi-document summarization system that concisely extracts …
Abstract
The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果