作者
David Newman, Nagendra Koilada, Jey Han Lau, Timothy Baldwin
发表日期
2012/12
研讨会论文
Proceedings of COLING 2012
页码范围
2077-2092
简介
Automatically extracting terminology and index terms from scientific literature is useful for a variety of digital library, indexing and search applications. This task is non-trivial, complicated by domain-specific terminology and a steady introduction of new terminology. Correctly identifying nested terminology further adds to the challenge. We present a Dirichlet Process (DP) model of word segmentation where multiword segments are either retrieved from a cache or newly generated. We show how this DP-Segmentation model can be used to successfully extract nested terminology, outperforming previous methods for solving this problem.
引用总数
20132014201520162017201820192020202120222023202421361663565561
学术搜索中的文章