聚焦爬虫技术研究综述

周立柱, 林玲 - 计算机应用, 2005 - joca.cn
因特网的迅速发展对万维网信息的查找与发现提出了巨大的挑战. 对于大多用户提出的与主题或
领域相关的查询需求, 传统的通用搜索引擎往往不能提供令人满意的结果网页 …

A survey of Web crawlers for information retrieval

M Kumar, R Bhatia, D Rattan - Wiley Interdisciplinary Reviews …, 2017 - Wiley Online Library
Performance of any search engine relies heavily on its Web crawler. Web crawlers are the
programs that get webpages from the Web by following hyperlinks. These webpages are …

[图书][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

[图书][B] Web archiving: issues and methods

J Masanès, J Masanés - 2006 - Springer
Cultural artifacts of the past have always had an important role in the formation of
consciousness and self-understanding of a society and the construction of its future. The …

Learning to crawl: Comparing classification schemes

G Pant, P Srinivasan - ACM Transactions on Information Systems (TOIS), 2005 - dl.acm.org
Topical crawling is a young and creative area of research that holds the promise of
benefiting from several sophisticated data mining techniques. The use of classification …

Link contexts in classifier-guided topical crawlers

G Pant, P Srinivasan - IEEE Transactions on knowledge and …, 2005 - ieeexplore.ieee.org
Context of a hyperlink or link context is defined as the terms that appear in the text around a
hyperlink within a Web page. Link contexts have been applied to a variety of Web …

Sentiment-focused web crawling

AG Vural, BB Cambazoglu, P Karagoz - ACM Transactions on the Web …, 2014 - dl.acm.org
Sentiments and opinions expressed in Web pages towards objects, entities, and products
constitute an important portion of the textual content available in the Web. In the last decade …

Structure-driven crawler generation by example

MLA Vidal, AS da Silva, ES de Moura… - Proceedings of the 29th …, 2006 - dl.acm.org
Many Web IR and Digital Library applications require a crawling process to collect pages
with the ultimate goal of taking advantage of useful information available on Web sites. For …

LSCrawler: a framework for an enhanced focused web crawler based on link semantics

M Yuvarani, A Kannan - … on Web Intelligence (WI 2006 Main …, 2006 - ieeexplore.ieee.org
The traditional process of focused web crawler is to harvest a collection of web documents
that are focused on the topical subspaces. The intricacy of focused crawlers is identifying the …

Finding seeds to bootstrap focused crawlers

K Vieira, L Barbosa, AS Da Silva, J Freire, E Moura - World Wide Web, 2016 - Springer
Focused crawlers are effective tools for applications requiring a high number of pages
belonging to a specific topic. Several strategies for implementing these crawlers have been …