查看文章

rishabhmehrotra.com 中的 [PDF]

Improving lda topic models for microblogs via tweet pooling and automatic labeling

作者

Rishabh Mehrotra, Scott Sanner, Wray Buntine, Lexing Xie

发表日期

2013/7/28

图书

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

页码范围

889-892

简介

Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on …

引用总数

被引用次数：652

2013201420152016201720182019202020212022202320244 32 35 66 59 73 83 75 74 80 50 14

学术搜索中的文章

Improving lda topic models for microblogs via tweet pooling and automatic labeling

R Mehrotra, S Sanner, W Buntine, L Xie - Proceedings of the 36th international ACM SIGIR …, 2013

被引用次数：652 相关文章所有 12 个版本