作者
Naseer Ahmed Sajid, Atta Rahman, Munir Ahmad, Dhiaa Musleh, Mohammed Imran Basheer Ahmed, Reem Alassaf, Sghaier Chabani, Mohammed Salih Ahmed, Asiya Abdus Salam, Dania AlKhulaifi
发表日期
2023/6/3
来源
Applied Sciences
卷号
13
期号
11
页码范围
6804
出版商
MDPI
简介
Over the decades, a tremendous increase has been witnessed in the production of documents available in digital form. The increased production of documents has gained so much momentum that their rate of production jumps two-fold every five years. These articles are searched over the internet via search engines, digital libraries, and citation indexes. However, the retrieval of relevant research papers for user queries is still a pipedream. This is because scientific documents are not indexed based on some subject classification hierarchies. Hence, the classification of these documents becomes a challenging task for the researchers. Classification of the documents can be two-fold: one way is to assign a single label to each document and the other is to assign multi-labels to each document based on its belonging domains. Classification of the documents can be performed by using either the available metadata or the whole content of the documents. While performing classification, there are many challenges which may belong to the dataset, feature selection technique, preprocessing methodology, and which classification model is suitable for the classification of the documents. This paper highlights the issues for single-label and multi-label classification by using either metadata or content of the documents and why metadata-based approaches are better than content-based approaches in terms of feasibility.
引用总数