An efficient algorithm for unsupervised word segmentation with branching entropy and MDL

V Zhikov, H Takamura, M Okumura - Information and Media …, 2013 - jstage.jst.go.jp
This paper proposes a fast and simple unsupervised word segmentation algorithm that
utilizes the local predictability of adjacent character sequences, while searching for a least …

A Language Independent n-Gram Model for Word Segmentation

SS Kang, KB Hwang - AI 2006: Advances in Artificial Intelligence: 19th …, 2006 - Springer
Word segmentation is an essential first step in the processing of far east asian languages
(ie, Chinese, Japanese, and Korean), which heavily influences subsequent processes such …

Unsupervised word segmentation with bi-directional neural language model

L Wang, X Zheng - ACM Transactions on Asian and Low-Resource …, 2022 - dl.acm.org
We propose an unsupervised word segmentation model, in which for each unlabelled
sentence sample, the learning objective is to maximize the generation probability of the …

A word segmentation algorithm for Chinese language based on N-gram models and machine learning

Y Wu, G Wei, H Li - 电子与信息学报, 2001 - jeit.ac.cn
Automatic word segmentation for the Chinese language is a fundamental and difficult
problem in the field of computer Chinese language information processing. This paper …

[PDF][PDF] Gated recursive neural network for Chinese word segmentation

X Chen, X Qiu, C Zhu, XJ Huang - … of the 53rd Annual Meeting of …, 2015 - aclanthology.org
Recently, neural network models for natural language processing tasks have been
increasingly focused on for their ability of alleviating the burden of manual feature …

[PDF][PDF] Word segmentation on Chinese mirco-blog data with a linear-time incremental model

K Zhang, M Sun, C Zhou - … of the Second CIPS-SIGHAN Joint …, 2012 - aclanthology.org
This paper describes the model we designed for the word segmentation bakeoff on Chinese
micro-blog data in the 2nd CIPS-SIGHAN joint conference on Chinese language processing …

[PDF][PDF] A Hierarchical EM Approach to Word Segmentation.

F Peng, D Schuurmans - NLPRS, 2001 - Citeseer
We propose a simple two-level hierarchical probability model for unsupervised word
segmentation. By treating words as strings composed of morphemes/phonemes which are …

[PDF][PDF] A trainable rule-based algorithm for word segmentation

DD Palmer - 35th Annual Meeting of the Association for …, 1997 - aclanthology.org
This paper presents a trainable rule-based algorithm for performing word segmentation. The
algorithm provides a simple, language-independent alternative to large-scale lexicai-based …

On the difficulty of segmenting words with attention

R Sanabria, H Tang, S Goldwater - arXiv preprint arXiv:2109.10107, 2021 - arxiv.org
Word segmentation, the problem of finding word boundaries in speech, is of interest for a
range of tasks. Previous papers have suggested that for sequence-to-sequence models …

[PDF][PDF] Long short-term memory neural networks for chinese word segmentation

X Chen, X Qiu, C Zhu, P Liu… - Proceedings of the 2015 …, 2015 - aclanthology.org
Currently most of state-of-the-art methods for Chinese word segmentation are based on
supervised learning, whose features are mostly extracted from a local context. These …