作者
Mazhar Ali, Asim Imdad Wagan
发表日期
2019/1/1
期刊
Mehran University Research Journal of Engineering & Technology
卷号
38
期号
1
页码范围
185-196
出版商
Mehran University of Engineering & Technology
简介
The linguistic corpus of Sindhi language is significant for computational linguistics process, machine learning process, language features identification and analysis, semantic and sentiment analysis, information retrieval and so on. There is little computational linguistics work done on Sindhi text whereas, English, Arabic, Urdu and some other languages are fully resourced computationally. The grammar and morphemes of these languages are analyzed properly using dissimilar machine learning methods. The development and research work regarding computational linguistics are in progress on Sindhi language at this time. This study is planned to develop the Sindhi annotated corpus using universal POS (Part of Speech) tag set and Sindhi POS tag set for the purpose of language features and variation analysis. The features are extracted using TF-IDF (Term Frequency and Inverse Document Frequency …
引用总数
2019202020212022202321322
学术搜索中的文章