gathered by the crawler contain duplicated data. Different URLs with Similar Text are
generally known as DUST. With the effect of DUST, the disk storage is wasted, quality
rankings are degraded and lower user experiences. To avoid such problems, many kind of
research has been recommended and the methods which are already available define only
URL DUST removal and detection. The system which is going to be implemented can find …