[PDF][PDF] Statistically-enhanced new word identification in a rule-based Chinese system

A Wu, Z Jiang - Second Chinese Language Processing Workshop, 2000 - aclanthology.org
This paper presents a mechanism of new word identification in Chinese text where
probabilities are used to filter candidate character strings and to assign POS to the selected …

[PDF][PDF] Unknown word detection for Chinese by a corpus-based learning method

KJ Chen, MH Bai - … & Chinese Language Processing, Volume 3 …, 1998 - aclanthology.org
One of the most prominent problems in computer processing of the Chinese language is
identification of the words in a sentence. Since there are no blanks to mark word boundaries …

[PDF][PDF] Automatic extraction of new words from Japanese texts using generalized forward-backward search

M Nagata - Conference on Empirical Methods in Natural …, 1996 - aclanthology.org
We present a novel new word extraction method from Japanese texts based on expected
word frequencies. First, we compute expected word frequencies from Japanese texts using a …

[PDF][PDF] A bottom-up merging algorithm for Chinese unknown word extraction

WY Ma, KJ Chen - Proceedings of the second SIGHAN workshop …, 2003 - aclanthology.org
Statistical methods for extracting Chinese unknown words usually suffer a problem that
superfluous character strings with strong statistical associations are extracted as well. To …

[PDF][PDF] Unknown word extraction for Chinese documents

KJ Chen, WY Ma - … 2002: The 19th International Conference on …, 2002 - aclanthology.org
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is
difficult, because of segmentation ambiguities and occurrences of unknown words …

The use of SVM for Chinese new word identification

H Li, CN Huang, J Gao, X Fan - … , Hainan Island, China, March 22-24, 2004 …, 2005 - Springer
We present a study of new word identification (NWI) to improve the performance of a
Chinese word segmenter. In this paper the distribution and types of new words are …

[PDF][PDF] Japanese unknown word identification by character-based chunking

M Asahara, Y Matsumoto - COLING 2004: Proceedings of the 20th …, 2004 - aclanthology.org
We introduce a character-based chunking for unknown word identification in Japanese text.
A major advantage of our method is an ability to detect low frequency unknown words of …

Chinese new word identification: a latent discriminative model with global features

X Sun, DG Huang, HY Song, FJ Ren - Journal of computer science and …, 2011 - Springer
Chinese new words are particularly problematic in Chinese natural language processing.
With the fast development of Internet and information explosion, it is impossible to get a …

[PDF][PDF] Recognizing unregistered names for mandarin word identification

LJ Wang, WC Li, CH Chang - COLING 1992 Volume 4: The 14th …, 1992 - aclanthology.org
Word Identification has been an important and active issue in Chinese Natural Language
Processing. In this paper, a new mechanism, based on the concept of sublanguage, is …

[PDF][PDF] An unsupervised iterative method for Chinese new lexicon extraction

JS Chang, KY Su - … & Chinese Language Processing, Volume 2 …, 1997 - aclanthology.org
An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a
Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation …