Web data extraction, applications and techniques: A survey

E Ferrara, P De Meo, G Fiumara… - Knowledge-based …, 2014 - Elsevier
Abstract Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many approaches to extracting …

A survey of web information extraction systems

CH Chang, M Kayed, MR Girgis… - IEEE transactions on …, 2006 - ieeexplore.ieee.org
The Internet presents a huge amount of useful information which is usually formatted for its
users, which makes it difficult to extract relevant data from various sources. Therefore, the …

V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception

R Xu, X Xia, J Li, H Li, S Zhang, Z Tu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Modern perception systems of autonomous vehicles are known to be sensitive to occlusions
and lack the capability of long perceiving range. It has been one of the key bottlenecks that …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Editable scene simulation for autonomous driving via collaborative llm-agents

Y Wei, Z Wang, Y Lu, C Xu, C Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Scene simulation in autonomous driving has gained significant attention because of its huge
potential for generating customized data. However existing editable scene simulation …

[图书][B] Web data mining: exploring hyperlinks, contents, and usage data

B Liu - 2011 - Springer
Liu has written a comprehensive text on Web mining, which consists of two parts. The first
part covers the data mining and machine learning foundations, where all the essential …

Trafilatura: A web scraping library and command-line tool for text discovery and extraction

A Barbaresi - Proceedings of the 59th Annual Meeting of the …, 2021 - aclanthology.org
An essential operation in web corpus construction consists in retaining the desired content
while discarding the rest. Another challenge finding one's way through websites. This article …

Information extraction

S Sarawagi - Foundations and Trends® in Databases, 2008 - nowpublishers.com
The automatic extraction of information from unstructured sources has opened up new
avenues for querying, organizing, and analyzing data by drawing upon the clean semantics …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

[PDF][PDF] Deep Web 数据集成研究综述

刘伟, 孟小峰, 孟卫一 - 计算机学报, 2007 - c.xml.org.cn
As the rapid development of World Wide Web, there is tremendous information" hiddened" in
Deep Web, and its capacity is increasing rapidly. The information can only be accessed by …