Text document categorization by term association

ML Antonie, OR Zaiane - 2002 IEEE International Conference …, 2002 - ieeexplore.ieee.org
2002 IEEE International Conference on Data Mining, 2002. Proceedings., 2002ieeexplore.ieee.org
A good text classifier is a classifier that efficiently categorizes large sets of text documents in
a reasonable time frame and with an acceptable accuracy, and that provides classification
rules that are human readable for possible fine-tuning. If the training of the classifier is also
quick, this could become in some application domains a good asset for the classifier. Many
techniques and algorithms for automatic text categorization have been devised. According to
published literature, some are more accurate than others, and some provide more …
A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and algorithms for automatic text categorization have been devised. According to published literature, some are more accurate than others, and some provide more interpretable classification models than others. However, none can combine all the beneficial properties enumerated above. In this paper we present a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field. We focus on two major problems: (1) finding the best term association rules in a textual database by generating and pruning; and (2) using the rules to build a text classifier. Our text categorization method proves to be efficient and effective, and experiments on well-known collections show that the classifier performs well. In addition, training as well as classification are both fast and the generated rules are human readable.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果