Exploiting text structure for topic identification- 学术资源搜索

[PDF][PDF] Exploiting text structure for topic identification

T Nomoto, Y Matsumoto - Fourth Workshop on Very Large …, 1996 - aclanthology.org

Fourth Workshop on Very Large Corpora, 1996•aclanthology.org

Summary

The paper demonstrates how information on text structure can be used to improve the performance on the identification of topical words in texts, which is based on a probabilistic model of text categorization. We use texts which are not explicitly structured. A text structure is identified by measuring the similarity between segments comprising the text and its title. It is shown that a text structure thus identified gives a good clue to finding out parts of the text most relevant to its content. The significance of exploiting information on the structure for topic identification is demonstrated by a set of experiments conducted on the 19Mb of Japanese newspaper articles. The paper also brings concepts from the rhetorical structure theory (RST) to the statistical analysis of a text structure. Finally, it is shown that information on text structure is more effective for large documents than for small documents.

aclanthology.org

展开收起

被引用次数：17 相关文章所有 5 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

[PDF][PDF] Exploiting text structure for topic identification

引用