作者
Mohammed Faizan Farooqui, Mohammad Muqeem, Ahmad Sultan, Jabeen Nazeer, Hikmat AM Abdeljaber
发表日期
2023
期刊
International Journal of Advanced Computer Science and Applications
卷号
14
期号
2
出版商
Science and Information (SAI) Organization Limited
简介
Search engines are the instruments for website navigation and search, because the Internet is big and has expanded greatly. By continuously downloading web pages for processing, search engines provide search facilities and maintain indices for web documents. Online crawling is the term for this process of downloading web pages. This paper proposes solution to network traffic problem in migrating parallel web crawler. The primary benefit of a parallel web crawler is that it does local analysis at the data's residence rather than inside the web search engine repository. As a result, network load and traffic are greatly reduced, which enhances the performance, efficacy, and efficiency of the crawling process. Another benefit of moving to a parallel crawler is that as the web gets bigger, it becomes important to parallelize crawling operations in order to retrieve web pages more quickly. A web crawler will produce pages of excellent quality. When the crawling process moves to a host or server with a specific domain, it begins downloading pages from that domain. Incremental crawling will maintain the quality of downloaded pages and keep the pages in the local database updated. Java is used to implement the crawler. The model that was put into practice supports all aspects of a three-tier, realtime architecture. An implementation of a parallel web crawler migration is shown in this paper. The method for efficient parallel web migration detects changes in the content and structure using neural network-based change detection techniques in parallel web migration. This will produce highquality pages and detection for changes will always download new …
引用总数
学术搜索中的文章
MF Farooqui, M Muqeem, A Sultan, J Nazeer… - International Journal of Advanced Computer Science …, 2023