作者
Nguyen Cam Tu, Tran Thi Oanh, Phan Xuan Hieu, Ha Quang Thuy
发表日期
2005
期刊
The 8th Conference on Some selection problems of Information Technology and Telecommunication
页码范围
12
简介
Named entity recognition (NER) is the process of identifying different entity types (eg person, location, organization, or date/time), mentioned in natural language documents. It is an important task in information extraction and is a necessary precursor to higher processing and understanding natural language such as text mining, text summarization, question-answering, and machine translation. Further, the automated recognition of named entities is essential for the increasing need of searching, extracting, and tracking relevant information on the web environment, and especially for building the emerging semantic web technology.
This paper presents a machine learning approach to the problem of detecting named entities within Vietnamese free-text and web documents that is based on the use of conditional random fields (CRFs)–a novel and powerful discriminative sequential learning model. The noticeable advantage of CRFs is the flexibility to incorporate a variety of arbitrary, overlapping, and non-independent features at different levels of granularity from training data. As a result, our NER system can predict named entity types accurately by relying on various kinds of contextual evidence ranging from linguistic information (ie, words or phrases), text format, to a rich set of regular expressions. The experimental results (precision of 83.69%, recall of 87.41%, F1 score of 85.51%) on a moderate number of web documents show that our method can not only achieve significant accuracy but also effectively deal with potential ambiguity in Vietnamese.
引用总数
20072008200920102011201220132014201520162017201820192020202120223113321111
学术搜索中的文章