[PDF][PDF] Automatic Decomposition of Multi-Author Documents Using Grammar Analysis.

M Tschuggnall, G Specht - Grundlagen von Datenbanken, 2014 - academia.edu
Grundlagen von Datenbanken, 2014academia.edu
The task of text segmentation is to automatically split a text document into individual
subparts, which differ according to specific measures. In this paper, an approach is
presented that attempts to separate text sections of a collaboratively written document based
on the grammar syntax of authors. The main idea is thereby to quantify differences of the
grammatical writing style of authors and to use this information to build paragraph clusters,
whereby each cluster is assigned to a different author. In order to analyze the style of a …
Abstract
The task of text segmentation is to automatically split a text document into individual subparts, which differ according to specific measures. In this paper, an approach is presented that attempts to separate text sections of a collaboratively written document based on the grammar syntax of authors. The main idea is thereby to quantify differences of the grammatical writing style of authors and to use this information to build paragraph clusters, whereby each cluster is assigned to a different author. In order to analyze the style of a writer, text is split into single sentences, and for each sentence a full parse tree is calculated. Using the latter, a profile is computed subsequently that represents the main characteristics for each paragraph. Finally, the profiles serve as input for common clustering algorithms. An extensive evaluation using different English data sets reveals promising results, whereby a supplementary analysis indicates that in general common classification algorithms perform better than clustering approaches.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References