Hamshahri: A standard Persian text collection

A AleAhmad, H Amiri, E Darrudi, M Rahgozar… - Knowledge-Based …, 2009 - Elsevier
The Persian language is one of the dominant languages in the Middle East, so there are
significant amount of Persian documents available on the Web. Due to the different nature of …

Building a test collection for Sorani Kurdish

KS Esmaili, D Eliassi, S Salavati… - 2013 ACS …, 2013 - ieeexplore.ieee.org
Despite having a large number of speakers, Sorani—one of the two principle branches of
the Kurdish language—is among the less-resourced languages. This paper reports on the …

Automatic identification of light stop words for Persian information retrieval systems

M Sadeghi, J Vegas - Journal of information science, 2014 - journals.sagepub.com
Stop word identification is one of the most important tasks for many text processing
applications such as information retrieval. Stop words occur too frequently in documents in a …

小型中文信息检索测试集的构建与分析

徐建民, 王平 - 情报杂志, 2009 - cqvip.com
在国内信息检索研究日益受到重视的背景下, 介绍了构建小型中文测试集的意义及测试集的研究
现状. 参考国外测试集的构建经验, 论述了小型中文信息检索测试集的构建方法 …

Towards kurdish information retrieval

KS Esmaili, S Salavati, A Datta - ACM Transactions on Asian Language …, 2014 - dl.acm.org
The Kurdish language is an Indo-European language spoken in Kurdistan, a large
geographical region in the Middle East. Despite having a large number of speakers, Kurdish …

CURE: Collection for urdu information retrieval evaluation and ranking

M Iqbal, B Tahir, MA Mehmood - … International Conference on …, 2021 - ieeexplore.ieee.org
Urdu is a widely spoken language with 163 million speakers across the globe. Information
Retrieval (IR) for Urdu entails special consideration of research community due to its rich …

Building a Nasa Yuwe language test collection

LM Sierra, CA Cobos, JC Corrales… - … Linguistics and Intelligent …, 2015 - Springer
The nasa yuwe is the language of the Paez people in Colombia is currently an endangered
language [1]. The nasa community has therefore been reviewing different strategies with the …

A framework for test topic generation

J Chen, M Namgoong, G Cao - IConference 2016 Proceedings, 2016 - ideals.illinois.edu
This study proposes a test topic generation framework through an analysis of existing
literature. The framework contains three components, including a list of questions for eliciting …

Towards acquisition of a thematic Persian corpus from the Tebyan Portal: TebCorp

SN Khalifehsoltani, A Cholmaghani… - 2010 2nd …, 2010 - ieeexplore.ieee.org
The TebCorp collection is a large thematic modern Persian text collection which consists of
500 MB of text from Tebyan Portal. TebCorp contains more than 93,000 articles in 1097 …

[PDF][PDF] Kyumars Sheykh Esmaili, Nanyang Technological University

S Salavati, A Datta - academia.edu
With increasingly higher numbers of non-English language web searchers the problems of
efficient handling of non-English documents and user queries are becoming major issues for …