Classifying XML documents based on structure/content similarity

G Xing, J Guo, Z Xia - International Workshop of the Initiative for the …, 2006 - Springer
G Xing, J Guo, Z Xia
International Workshop of the Initiative for the Evaluation of XML Retrieval, 2006Springer
In this paper, we present a framework for classifying XML documents based on
structure/content similarity between XML documents. Firstly, an algorithm is proposed for
computing the edit distance between an ordered labeled tree and a regular hedge grammar.
The new edit distance gives a more precise measure for structural similarity than existing
distance metrics in the literature. Secondly, we study schema extraction from XML
documents, and an effective solution based on minimum length description (MLD) principle …
Abstract
In this paper, we present a framework for classifying XML documents based on structure/content similarity between XML documents. Firstly, an algorithm is proposed for computing the edit distance between an ordered labeled tree and a regular hedge grammar. The new edit distance gives a more precise measure for structural similarity than existing distance metrics in the literature. Secondly, we study schema extraction from XML documents, and an effective solution based on minimum length description (MLD) principle is given. Our schema extraction method allows trade off between schema simplicity and precision based on the user’s specification. Thirdly, classification of XML documents is discussed. Representation of XML documents based on the structures and contents is also studied. The efficacy and efficiency of our methodology have been tested using the data sets from XML Mining Challenge.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果