作者
Zobia Rehman, Waqas Anwar, Usama Ijaz Bajwa
发表日期
2011/11/8
期刊
Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP
卷号
2011
页码范围
40-45
简介
Urdu is morphologically rich language with different nature of its characters. Urdu text tokenization and sentence boundary disambiguation is difficult as compared to the language like English. Major hurdle for tokenization is improper use of space between words, where as absence of case discrimination makes the sentence boundary detection a difficult task. In this paper some issues regarding both of these language processing tasks have been identified.
引用总数
201220132014201520162017201820192020202120222023141153424328
学术搜索中的文章
Z Rehman, W Anwar, UI Bajwa - Proceedings of the 2nd workshop on south southeast …, 2011