Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

R Khatun, A Sarkar - Multimedia Tools and Applications, 2024 - Springer
R Khatun, A Sarkar
Multimedia Tools and Applications, 2024Springer
During the information retrieval process, individuals locate relevant web pages by entering
specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords
are absent from the intended page, the effectiveness of information retrieval will be
significantly compromised. Thus, the role of keywords in text processing remains of utmost
importance. Particularly in intricate contexts, relying on manual analysis by readers can
prove to be both time-intensive and unfeasible. Most existing methods are addressed with …
Abstract
During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果