Efficient partitioning strategies for distributed web crawling

BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer

Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …

被引用次数：110 相关文章所有 12 个版本

[PDF] arxiv.org

Change detection and notification of web pages: A survey

V Mallawaarachchi, L Meegahapola… - ACM Computing …, 2020 - dl.acm.org

The majority of currently available webpages are dynamic in nature and are changing
frequently. New content gets added to webpages, and existing content gets updated or …

被引用次数：21 相关文章所有 12 个版本

Smart distributed web crawler

SK Bal, G Geetha - 2016 International Conference on …, 2016 - ieeexplore.ieee.org

Centralized crawlers are not adequate to spider meaningful and relevant portions of the
Web. A crawler with good scalability and load balancing can bring growth to performance …

被引用次数：19 相关文章所有 2 个版本

[图书][B] Advanced topics in information retrieval

M Melucci, R Baeza-Yates - 2011 - books.google.com

Information retrieval is the science concerned with the effective and efficient retrieval of
documents starting from their semantic content. It is employed to fulfill some information …

被引用次数：33 相关文章所有 7 个版本

[PDF] eudl.eu

On the feasibility of geographically distributed web crawling

BB Cambazoglu, F Junqueira, V Plachouras… - 3rd International ICST …, 2010 - eudl.eu

We identify the issues that are important in design of a geographically distributed Web
crawler. The identified issues are discussed from a" benefit" and" challenge" point of view …

被引用次数：36 相关文章所有 3 个版本

[PDF] keio.ac.jp

Discovering URLs through user feedback

X Bai, BB Cambazoglu, FP Junqueira - Proceedings of the 20th ACM …, 2011 - dl.acm.org

Search engines rely upon crawling to build their Web page collections. A Web crawler
typically discovers new URLs by following the link structure induced by links on Web pages …

被引用次数：11 相关文章

[PDF] archive.org

Crowdcrawling approach for community based plagiarism detection service

S Butakov - Proceedings of the 23rd International Conference on …, 2014 - dl.acm.org

In the era of exponentially growing web and exploding online education the problem of
digital plagiarism has become one of the most burning ones in many areas. Efficient internet …

被引用次数：4 相关文章所有 2 个版本

[PDF] ipb.pt

Estratégias de partição para a optimização da descarga distribuída de Web

JLP Exposto - 2008 - search.proquest.com

Estratégias de partiçao para a optimizaçao da descarga distribuıda da Web Page 1
Universidade do Minho Escola de Engenharia Departamento de Informática Estratégias de …

被引用次数：2 相关文章所有 3 个版本

[PDF] researchgate.net

Hypergraph-theoretic partitioning models for parallel web crawling

A Turk, BB Cambazoglu, C Aykanat - Computer and Information Sciences …, 2012 - Springer

Parallel web crawling is an important technique employed by large-scale search engines for
content acquisition. A commonly used inter-processor coordination scheme in parallel …

被引用次数：1 相关文章所有 9 个版本

[PDF] googleapis.com

Hybrid task assignment for web crawling

G Von Bochmann, GVR Jourdan, IV Onut… - US Patent …, 2022 - Google Patents

A computer-implemented method and/or computer program product selectively assigns a
task using a hybrid task assign ment process. One or more processors direct a working …

高级搜索

QQ 群