作者
Jaebeom You, Jaekyu Lee, Hyuk-Yoon Kwon
发表日期
2021/1/17
研讨会论文
2021 IEEE International Conference on Big Data and Smart Computing (BigComp)
页码范围
24-27
出版商
IEEE
简介
In this paper, we propose a scraping method for collecting tweets, which we call DeepScrap. DeepScrap provides the complete scraping for the recent tweets that can be viewed on a specific user's page and crawls with a fast speed that overcomes the limited rates in Twitter APIs. Especially, to improve the crawling speed of DeepScrap, we devise a multiprocessing architecture while assigning different IPs to the multiple processes to follow the robots.txt of Twitter. This allows us to maximize the parallelism of crawling in a machine. We show that DeepScrap can crawl the entire tweets that are crawled by Twitter standard APIs by analyzing the tweets on 97 users. Through extensive experiments, we show that DeepScrap can crawl the entire tweets of 97 users, which amounts to 222,194 tweets while Twitter standard API can crawl only 12,586 tweets of them because of the constraints. We also show that multiprocessing …
引用总数
20212022202320241811
学术搜索中的文章
J You, J Lee, HY Kwon - 2021 IEEE International Conference on Big Data and …, 2021