作者
Mazhar Ali Dootio, Asim Imdad Wagan
发表日期
2018/8/1
期刊
Data in brief
卷号
19
页码范围
1504-1514
出版商
Elsevier
简介
Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis.
引用总数
201920202021202220232024212221