[图书][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

Scalability challenges in web search engines

BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer
Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …

A brief history of web crawlers

SM Mirtaheri, ME Dinçktürk, S Hooshmand… - arXiv preprint arXiv …, 2014 - arxiv.org
Web crawlers visit internet applications, collect data, and learn about new web pages from
visited pages. Web crawlers have a long and interesting history. Early web crawlers …

Challenges on distributed web retrieval

R Baeza-Yates, C Castillo, F Junqueira… - 2007 IEEE 23rd …, 2006 - ieeexplore.ieee.org
In the ocean of Web data, Web search engines are the primary way to access content. As the
data is on the order of petabytes, current search engines are very large centralized systems …

Information retrieval in web crawling: A survey

C Saini, V Arora - 2016 International Conference on Advances …, 2016 - ieeexplore.ieee.org
In today's scenario, World Wide Web (WWW) is flooded with huge amount of information.
Due to growing popularity of the internet, finding the meaningful information among billions …

On the feasibility of multi-site web search engines

R Baeza-Yates, A Gionis, F Junqueira… - Proceedings of the 18th …, 2009 - dl.acm.org
Web search engines are often implemented as centralized systems. Designing and
implementing a Web search engine in a distributed environment is a challenging …

[PDF][PDF] 广域网分布式Web 爬虫

许笑, 张伟哲, 张宏莉, 方滨兴 - 软件学报, 2010 - jos.org.cn
分析了广域网分布式Web 爬虫相对于局域网爬虫的诸多优势, 提出了广域网分布式Web
爬虫的3 个核心问题: Web 划分, Agent 协同和Agent 部署. 围绕这3 个问题 …

Geographically focused collaborative crawling

W Gao, HC Lee, Y Miao - … of the 15th international conference on World …, 2006 - dl.acm.org
A collaborative crawler is a group of crawling nodes, in which each crawling node is
responsible for a specific portion of the web. We study the problem of collecting geographi …

[图书][B] Advanced topics in information retrieval

M Melucci, R Baeza-Yates - 2011 - books.google.com
Information retrieval is the science concerned with the effective and efficient retrieval of
documents starting from their semantic content. It is employed to fulfill some information …

Unicrawl: A practical geographically distributed web crawler

C Fetzer, P Felber, É Rivière… - 2015 IEEE 8th …, 2015 - ieeexplore.ieee.org
As the wealth of information available on the web keeps growing, being able to harvest
massive amounts of data has become a major challenge. Web crawlers are the core …