Sentiment mining in WebFountain

J Yi, W Niblack - … Conference on Data Engineering (ICDE'05), 2005 - ieeexplore.ieee.org
WebFountain is a platform for very large-scale text analytics applications that allows uniform
access to a wide variety of sources. It enables the deployment of a variety of document-level …

[图书][B] Data mining: concepts and techniques

J Han, J Pei, H Tong - 2022 - books.google.com
Data Mining: Concepts and Techniques, Fourth Edition introduces concepts, principles, and
methods for mining patterns, knowledge, and models from various kinds of data for diverse …

[图书][B] Web data mining: exploring hyperlinks, contents, and usage data

B Liu - 2011 - Springer
Liu has written a comprehensive text on Web mining, which consists of two parts. The first
part covers the data mining and machine learning foundations, where all the essential …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

Boilerplate detection using shallow text features

C Kohlschütter, P Fankhauser, W Nejdl - … on Web search and data mining, 2010 - dl.acm.org
In addition to the actual content Web pages consist of navigational elements, templates, and
advertisements. This boilerplate text typically is not related to the main content, may …

Vips: a vision-based page segmentation algorithm

D Cai, S Yu, JR Wen, WY Ma - 2003 - microsoft.com
A new web content structure analysis based on visual representation is proposed in this
paper. Many web applications such as information retrieval, information extraction and …

A survey on PageRank computing

P Berkhin - Internet mathematics, 2005 - Taylor & Francis
This survey reviews the research related to PageRank computing. Components of a
PageRank vector serve as authority weights for web pages independent of their textual …

Web data extraction based on partial tree alignment

Y Zhai, B Liu - Proceedings of the 14th international conference on …, 2005 - dl.acm.org
This paper studies the problem of extracting data from a Web page that contains several
structured data records. The objective is to segment these data records, extract data …

On the bursty evolution of blogspace

R Kumar, J Novak, P Raghavan… - Proceedings of the 12th …, 2003 - dl.acm.org
We propose two new tools to address the evolution of hyperlinked corpora. First, we define
time graphs to extend the traditional notion of an evolving directed graph, capturing link …

Eliminating noisy information in web pages for data mining

L Yi, B Liu, X Li - Proceedings of the ninth ACM SIGKDD international …, 2003 - dl.acm.org
A commercial Web page typically contains many information blocks. Apart from the main
content blocks, it usually has such blocks as navigation panels, copyright and privacy …