作者
Geli Fei, Bing Liu
发表日期
2015
期刊
EMNLP
页码范围
2347-2356
简介
In a typical social media content analysis task, the user is interested in analyzing posts of a particular topic. Identifying such posts is often formulated as a classification problem. However, this problem is challenging. One key issue is covariate shift. That is, the training data is not fully representative of the test data. We observed that the covariate shift mainly occurs in the negative data because topics discussed in social media are highly diverse and numerous, but the user-labeled negative training data may cover only a small number of topics. This paper proposes a novel technique to solve the problem. The key novelty of the technique is the transformation of document representation from the traditional ngram feature space to a center-based similarity (CBS) space. In the CBS space, the covariate shift problem is significantly mitigated, which enables us to build much better classifiers. Experiment results show that the proposed approach markedly improves classification.
引用总数
2016201720182019202020212022202320243765591055
学术搜索中的文章
G Fei, B Liu - Proceedings of the 2015 conference on empirical …, 2015