Domain adaptive bootstrapping for named entity recognition

D Wu, WS Lee, N Ye, HL Chieu - EMNLP'09 Proceedings of the …, 2009 - eprints.qut.edu.au
EMNLP'09 Proceedings of the 2009 Conference on Empirical Methods in …, 2009eprints.qut.edu.au
Bootstrapping is the process of improving the performance of a trained classifier by
iteratively adding data that is labeled by the classifier itself to the training set, and retraining
the classifier. It is often used in situations where labeled training data is scarce but
unlabeled data is abundant. In this paper, we consider the problem of domain adaptation:
the situation where training data may not be scarce, but belongs to a different domain from
the target application domain. As the distribution of unlabeled data is different from the …
Bootstrapping is the process of improving the performance of a trained classifier by iteratively adding data that is labeled by the classifier itself to the training set, and retraining the classifier. It is often used in situations where labeled training data is scarce but unlabeled data is abundant. In this paper, we consider the problem of domain adaptation: the situation where training data may not be scarce, but belongs to a different domain from the target application domain. As the distribution of unlabeled data is different from the training data, standard bootstrapping often has difficulty selecting informative data to add to the training set. We propose an effective domain adaptive bootstrapping algorithm that selects unlabeled target domain data that are informative about the target domain and easy to automatically label correctly. We call these instances bridges, as they are used to bridge the source domain to the target domain. We show that the method outperforms supervised, transductive and bootstrapping algorithms on the named entity recognition task. © 2009 ACL and AFNLP.
eprints.qut.edu.au
以上显示的是最相近的搜索结果。 查看全部搜索结果