作者
Alexandra Schofield, Måns Magnusson, David Mimno
发表日期
2017/4
研讨会论文
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, short papers
页码范围
432-436
简介
It is often assumed that topic models benefit from the use of a manually curated stopword list. Constructing this list is time-consuming and often subject to user judgments about what kinds of words are important to the model and the application. Although stopword removal clearly affects which word types appear as most probable terms in topics, we argue that this improvement is superficial, and that topic inference benefits little from the practice of removing stopwords beyond very frequent terms. Removing corpus-specific stopwords after model inference is more transparent and produces similar results to removing those words prior to inference.
引用总数
20172018201920202021202220232024714363145413723
学术搜索中的文章
A Schofield, M Magnusson, D Mimno - Proceedings of the 15th Conference of the European …, 2017