Scalability challenges in web search engines

BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer
Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …

Change detection and notification of web pages: A survey

V Mallawaarachchi, L Meegahapola… - ACM Computing …, 2020 - dl.acm.org
The majority of currently available webpages are dynamic in nature and are changing
frequently. New content gets added to webpages, and existing content gets updated or …

Smart distributed web crawler

SK Bal, G Geetha - 2016 International Conference on …, 2016 - ieeexplore.ieee.org
Centralized crawlers are not adequate to spider meaningful and relevant portions of the
Web. A crawler with good scalability and load balancing can bring growth to performance …

[图书][B] Advanced topics in information retrieval

M Melucci, R Baeza-Yates - 2011 - books.google.com
Information retrieval is the science concerned with the effective and efficient retrieval of
documents starting from their semantic content. It is employed to fulfill some information …

On the feasibility of geographically distributed web crawling

BB Cambazoglu, F Junqueira, V Plachouras… - 3rd International ICST …, 2010 - eudl.eu
We identify the issues that are important in design of a geographically distributed Web
crawler. The identified issues are discussed from a" benefit" and" challenge" point of view …

Discovering URLs through user feedback

X Bai, BB Cambazoglu, FP Junqueira - Proceedings of the 20th ACM …, 2011 - dl.acm.org
Search engines rely upon crawling to build their Web page collections. A Web crawler
typically discovers new URLs by following the link structure induced by links on Web pages …

Crowdcrawling approach for community based plagiarism detection service

S Butakov - Proceedings of the 23rd International Conference on …, 2014 - dl.acm.org
In the era of exponentially growing web and exploding online education the problem of
digital plagiarism has become one of the most burning ones in many areas. Efficient internet …

Estratégias de partição para a optimização da descarga distribuída de Web

JLP Exposto - 2008 - search.proquest.com
Estratégias de partiçao para a optimizaçao da descarga distribuıda da Web Page 1
Universidade do Minho Escola de Engenharia Departamento de Informática Estratégias de …

Hypergraph-theoretic partitioning models for parallel web crawling

A Turk, BB Cambazoglu, C Aykanat - Computer and Information Sciences …, 2012 - Springer
Parallel web crawling is an important technique employed by large-scale search engines for
content acquisition. A commonly used inter-processor coordination scheme in parallel …

Hybrid task assignment for web crawling

G Von Bochmann, GVR Jourdan, IV Onut… - US Patent …, 2022 - Google Patents
A computer-implemented method and/or computer program product selectively assigns a
task using a hybrid task assign ment process. One or more processors direct a working …