查看文章

aclanthology.org 中的 [PDF]

Challenges in Urdu text tokenization and sentence boundary disambiguation

作者

Zobia Rehman, Waqas Anwar, Usama Ijaz Bajwa

发表日期

2011/11/8

期刊

Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP

卷号

2011

页码范围

40-45

简介

Urdu is morphologically rich language with different nature of its characters. Urdu text tokenization and sentence boundary disambiguation is difficult as compared to the language like English. Major hurdle for tokenization is improper use of space between words, where as absence of case discrimination makes the sentence boundary detection a difficult task. In this paper some issues regarding both of these language processing tasks have been identified.

引用总数

被引用次数：38

2012201320142015201620172018201920202021202220231 4 1 1 5 3 4 2 4 3 2 8

学术搜索中的文章

Challenges in Urdu text tokenization and sentence boundary disambiguation

Z Rehman, W Anwar, UI Bajwa - Proceedings of the 2nd workshop on south southeast …, 2011

被引用次数：38 相关文章所有 3 个版本