Removing dust by metacrawler

S Deshmukh, P Chittekar - 2020 4th International Conference …, 2020 - ieeexplore.ieee.org
S Deshmukh, P Chittekar
2020 4th International Conference on Trends in Electronics and …, 2020ieeexplore.ieee.org
Nowadays URLs collected by Search engine contain mirrored data. Some of the pages
gathered by the crawler contain duplicated data. Different URLs with Similar Text are
generally known as DUST. With the effect of DUST, the disk storage is wasted, quality
rankings are degraded and lower user experiences. To avoid such problems, many kind of
research has been recommended and the methods which are already available define only
URL DUST removal and detection. The system which is going to be implemented can find …
Nowadays URLs collected by Search engine contain mirrored data. Some of the pages gathered by the crawler contain duplicated data. Different URLs with Similar Text are generally known as DUST. With the effect of DUST, the disk storage is wasted, quality rankings are degraded and lower user experiences. To avoid such problems, many kind of research has been recommended and the methods which are already available define only URL DUST removal and detection. The system which is going to be implemented can find and erase content DUST and URL DUST. The concept of the Metacrawler is introduced that crawl the documents and gets results from all the three search engines. The comparisons of content of every website with the another to eliminate mirrored data using k- gram paraphrase technique is defined in current method.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果