作者
Bo Tang, Steven Kay, Haibo He
发表日期
2016/5/5
期刊
IEEE transactions on knowledge and data engineering
卷号
28
期号
9
页码范围
2508-2521
出版商
IEEE
简介
Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ( ) and methods, for text categorization. The promising results of extensive experiments …
引用总数
201620172018201920202021202220232024102541454650302411
学术搜索中的文章
B Tang, S Kay, H He - IEEE transactions on knowledge and data engineering, 2016