A survey of Web crawlers for information retrieval

M Kumar, R Bhatia, D Rattan - Wiley Interdisciplinary Reviews …, 2017 - Wiley Online Library
Performance of any search engine relies heavily on its Web crawler. Web crawlers are the
programs that get webpages from the Web by following hyperlinks. These webpages are …

Regularized cost-model oblivious database tuning with reinforcement learning

D Basu, Q Lin, W Chen, HT Vo, Z Yuan… - Transactions on Large …, 2016 - Springer
In this paper, we propose a learning approach to adaptive performance tuning of database
applications. The objective is to validate the opportunity to devise a tuning strategy that does …

Focused crawling through reinforcement learning

M Han, PH Wuillemin, P Senellart - … , ICWE 2018, Cáceres, Spain, June 5 …, 2018 - Springer
Focused crawling aims at collecting as many Web pages relevant to a target topic as
possible while avoiding irrelevant pages, reflecting limited resources available to a Web …

Selective harvesting over networks

F Murai, D Rennó, B Ribeiro, GL Pappa… - Data Mining and …, 2018 - Springer
Active search on graphs focuses on collecting certain labeled nodes (targets) given global
knowledge of the network topology and its edge weights (encoding pairwise similarities) …

Tree-based focused web crawling with reinforcement learning

A Kontogiannis, D Kelesis, V Pollatos… - arXiv preprint arXiv …, 2021 - arxiv.org
A focused crawler aims at discovering as many web pages relevant to a target topic as
possible, while avoiding irrelevant ones. Reinforcement Learning (RL) has been utilized to …

Reinforcement learning approaches in dynamic environments

M Han - 2018 - inria.hal.science
Reinforcement learning is learning from interaction with an environment to achieve a goal. It
is an efficient framework to solve sequential decision-making problems, using Markov …

A Frequent Named Entities Based Approach for Interpreting Reputation in Twitter

NB Seghouani, F Bugiotti… - Data Science and …, 2018 - inria.hal.science
Twitter is a social network that provides a powerful source of data. The analysis of those data
offers many challenges among those stands out the opportunity to find reputation of a …

Smart crawling: a new approach toward focus crawling from Twitter

A Khazaie, NB Seghouani, F Bugiotti - arXiv preprint arXiv:2110.06022, 2021 - arxiv.org
Twitter is a social network that offers a rich and interesting source of information challenging
to retrieve and analyze. Twitter data can be accessed using a REST API. The available …

ARCOMEM crawling architecture

V Plachouras, F Carpentier, M Faheem, J Masanès… - Future internet, 2014 - mdpi.com
The World Wide Web is the largest information repository available today. However, this
information is very volatile and Web archiving is essential to preserve it for the future …

Interpreting reputation through frequent named entities in twitter

N Bennacer, F Bugiotti, M Hewasinghage, S Isaj… - … Engineering–WISE 2017 …, 2017 - Springer
Twitter is a social network that provides a powerful source of data. The analysis of those data
offers many challenges among those stands out the opportunity to find the reputation of a …