Removing dust by metacrawler- 学术资源搜索

文章

学术资源搜索

Removing dust by metacrawler

S Deshmukh, P Chittekar - 2020 4th International Conference …, 2020 - ieeexplore.ieee.org

2020 4th International Conference on Trends in Electronics and …, 2020•ieeexplore.ieee.org

Nowadays URLs collected by Search engine contain mirrored data. Some of the pages gathered by the crawler contain duplicated data. Different URLs with Similar Text are generally known as DUST. With the effect of DUST, the disk storage is wasted, quality rankings are degraded and lower user experiences. To avoid such problems, many kind of research has been recommended and the methods which are already available define only URL DUST removal and detection. The system which is going to be implemented can find and erase content DUST and URL DUST. The concept of the Metacrawler is introduced that crawl the documents and gets results from all the three search engines. The comparisons of content of every website with the another to eliminate mirrored data using k- gram paraphrase technique is defined in current method.

ieeexplore.ieee.org

展开收起

被引用次数：1 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Removing dust by metacrawler

引用