[PDF][PDF] Automatic extraction of new words from Japanese texts using generalized forward-backward search

M Nagata - Conference on Empirical Methods in Natural …, 1996 - aclanthology.org
We present a novel new word extraction method from Japanese texts based on expected
word frequencies. First, we compute expected word frequencies from Japanese texts using a …

[PDF][PDF] A self-organizing Japanese word segmenter using heuristic word identification and re-estimation

M Nagata - Fifth Workshop on Very Large Corpora, 1997 - aclanthology.org
We present a self-organized method to build a stochastic Japanese word segmenter from a
small number of basic words and a large amount of unsegmented training text. It consists of …

[PDF][PDF] An unsupervised iterative method for Chinese new lexicon extraction

JS Chang, KY Su - … & Chinese Language Processing, Volume 2 …, 1997 - aclanthology.org
An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a
Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation …

[PDF][PDF] Discovering Chinese words from unsegmented text

X Ge, W Pratt, P Smyth - Proceedings of the 22nd annual international …, 1999 - dl.acm.org
Abstract In English written text, words are separated by spaces, but in written Chinese text,
there are no such separators between words.(See Figure 1.) Thus, effective information …

Unknown Chinese word extraction based on variety of overlapping strings

Y Ye, Q Wu, Y Li, KP Chow, LCK Hui, SM Yiu - Information processing & …, 2013 - Elsevier
Not all languages, eg Chinese, have delimiters for words. To extract words from a sentence
in these languages, we usually rely on a dictionary for known words. For unknown words …

KR-WordRank: An unsupervised Korean word extraction method based on WordRank

H Kim, S Cho, P Kang - Journal of Korean Institute of Industrial …, 2014 - koreascience.kr
A Word is the smallest unit for text analysis, and the premise behind most text-mining
algorithms is that the words in given documents can be perfectly recognized. However, the …

[PDF][PDF] Statistically-enhanced new word identification in a rule-based Chinese system

A Wu, Z Jiang - Second Chinese Language Processing Workshop, 2000 - aclanthology.org
This paper presents a mechanism of new word identification in Chinese text where
probabilities are used to filter candidate character strings and to assign POS to the selected …

[PDF][PDF] Unknown word extraction for Chinese documents

KJ Chen, WY Ma - … 2002: The 19th International Conference on …, 2002 - aclanthology.org
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is
difficult, because of segmentation ambiguities and occurrences of unknown words …

A statistical corpus-based term extractor

P Pantel, D Lin - Advances in Artificial Intelligence: 14th Biennial …, 2001 - Springer
Term extraction is an important problem in natural language processing. In this paper, we
propose a language independent statistical corpusbased term extraction algorithm. In …

[PDF][PDF] KR-WordRank: WordRank 를개선한비지도학습기반한국어단어추출방법

김현중, 조성준, 강필성 - 대한산업공학회지, 2014 - researchgate.net
A Word is the smallest unit for text analysis, and the premise behind most text-mining
algorithms is that the words in given documents can be perfectly recognized. However, the …