Toppling top lists: Evaluating the accuracy of popular website lists

K Ruth, D Kumar, B Wang, L Valenta… - Proceedings of the 22nd …, 2022 - dl.acm.org
Proceedings of the 22nd ACM Internet Measurement Conference, 2022dl.acm.org
Researchers rely on lists of popular websites like the Alexa Top Million both to measure the
web and to evaluate proposed protocols and systems. Prior work has questioned the
correctness and consistency of these lists, but without ground truth data to compare against,
there has been no direct evaluation of list accuracy. In this paper, we evaluate the relative
accuracy of the most popular top lists of websites. We derive a set of popularity metrics from
server-side requests seen at Cloudflare, which authoritatively serves a significant portion of …
Researchers rely on lists of popular websites like the Alexa Top Million both to measure the web and to evaluate proposed protocols and systems. Prior work has questioned the correctness and consistency of these lists, but without ground truth data to compare against, there has been no direct evaluation of list accuracy. In this paper, we evaluate the relative accuracy of the most popular top lists of websites. We derive a set of popularity metrics from server-side requests seen at Cloudflare, which authoritatively serves a significant portion of the most popular websites. We evaluate top lists against these metrics and show that most lists capture web popularity poorly, with the exception of the Chrome User Experience Report (CrUX) dataset, which is the most accurate top list compared to Cloudflare across all metrics. We explore the biases that lower the accuracy of other lists, and we conclude with recommendations for researchers studying the web in the future.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果