作者
Wei Shao, Bolin Hua, Qiang Ma, Jiaying Liu, Hongwei He, Keqi Chen
发表日期
2020
研讨会论文
EEKE@ JCDL
页码范围
86-88
简介
Finding new terminology is a kind of named entity recognition (NER) problem. However, many high performance methods need labelled data. Although they can obtain excellent results on training and testing data, it is hard for them to process new unlabelled data. One factor leading to this gap is that features of new text are different from features models learn on training data owing to the difference between their domains. Also, these new scientific texts usually lack labels for extraction. So an unsupervised method which can also adapt different domains is needed. To overcome this problem, we propose an unsupervised method based on sentence pattern and part of speech. In detail, we initialize a few patterns to extract terminologies in certain sentences. In this step, we can obtain some terminologies and their part of speech sequences. Then, we try to find the same POS sequences in sentences not matched by initial patterns with obtained terminologies’ POS sequences. If a sentence is matched, we will utilize suitable words in this sentence to replace the extendable parts of initial patterns. In this case, we can obtain new patterns and get more terminologies by using new patterns. After several iterations, most terminology in scientific sentences can be extracted.
引用总数
学术搜索中的文章