[PDF][PDF] Unknown word detection for Chinese by a corpus-based learning method

KJ Chen, MH Bai - … & Chinese Language Processing, Volume 3 …, 1998 - aclanthology.org
One of the most prominent problems in computer processing of the Chinese language is
identification of the words in a sentence. Since there are no blanks to mark word boundaries …

[PDF][PDF] Unknown word extraction for Chinese documents

KJ Chen, WY Ma - … 2002: The 19th International Conference on …, 2002 - aclanthology.org
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is
difficult, because of segmentation ambiguities and occurrences of unknown words …

[PDF][PDF] Chinese unknown word identification using character-based tagging and chunking

CL Goh, M Asahara, Y Matsumoto - … of 41st Annual Meeting of the …, 2003 - aclanthology.org
Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an
essential task. During this task, the problem of unknown word occurs. It is impossible to …

[PDF][PDF] Statistically-enhanced new word identification in a rule-based Chinese system

A Wu, Z Jiang - Second Chinese Language Processing Workshop, 2000 - aclanthology.org
This paper presents a mechanism of new word identification in Chinese text where
probabilities are used to filter candidate character strings and to assign POS to the selected …

[PDF][PDF] Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge

JY Nie, ML Hannan, W Jin - Communications of COLIPS, 1995 - researchgate.net
A sentence in Chinese is written as a string of characters without separation between words.
Before linguistic analyses on a text, it has to be first segmented into a sequence of words …

[PDF][PDF] A bottom-up merging algorithm for Chinese unknown word extraction

WY Ma, KJ Chen - Proceedings of the second SIGHAN workshop …, 2003 - aclanthology.org
Statistical methods for extracting Chinese unknown words usually suffer a problem that
superfluous character strings with strong statistical associations are extracted as well. To …

[PDF][PDF] Automatic corpus-based Thai word extraction with the C4. 5 learning algorithm

V Sornlertlamvanich, T Potipiti… - COLING 2000 Volume …, 2000 - aclanthology.org
Abstract “Word” is difficult to define in the languages that do not exhibit explicit word
boundary, such as Thai. Traditional methods on defining words for this kind of languages …

[PDF][PDF] An unsupervised iterative method for Chinese new lexicon extraction

JS Chang, KY Su - … & Chinese Language Processing, Volume 2 …, 1997 - aclanthology.org
An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a
Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation …

[PDF][PDF] Word identification for Mandarin Chinese sentences

KJ Chen, SH Liu - COLING 1992 Volume 1: The 14th International …, 1992 - aclanthology.org
Chinese sentences are composed with string of characters without blanks to mark words.
However the basic unit for sentence parsing and understanding is word. Therefore the first …

[PDF][PDF] Recognizing unregistered names for mandarin word identification

LJ Wang, WC Li, CH Chang - COLING 1992 Volume 4: The 14th …, 1992 - aclanthology.org
Word Identification has been an important and active issue in Chinese Natural Language
Processing. In this paper, a new mechanism, based on the concept of sublanguage, is …