Improving language models by retrieving from trillions of tokens

S Borgeaud, A Mensch, J Hoffmann… - International …, 2022 - proceedings.mlr.press
We enhance auto-regressive language models by conditioning on document chunks
retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 …

Wikinformetrics: Construction and description of an open Wikipedia knowledge graph data set for informetric purposes

W Arroyo-Machado, D Torres-Salinas… - Quantitative science …, 2022 - direct.mit.edu
Wikipedia is one of the most visited websites in the world and is also a frequent subject of
scientific research. However, the analytical possibilities of Wikipedia information have not …

Percolation on feature-enriched interconnected systems

O Artime, M De Domenico - Nature communications, 2021 - nature.com
Percolation is an emblematic model to assess the robustness of interconnected systems
when some of their components are corrupted. It is usually investigated in simple scenarios …

Radflow: A recurrent, aggregated, and decomposable model for networks of time series

A Tran, A Mathews, CS Ong, L Xie - Proceedings of the Web Conference …, 2021 - dl.acm.org
We propose a new model for networks of time series that influence each other. Graph
structures among time series are found in diverse domains, such as web traffic influenced by …

Relating Wikipedia article quality to edit behavior and link structure

T Ruprechter, T Santos, D Helic - Applied Network Science, 2020 - Springer
Currently, the relation between edit behavior, link structure, and article quality is not well-
understood in our community, notwithstanding that this relationship may facilitate editing …

Subset node representation learning over large dynamic graphs

X Guo, B Zhou, S Skiena - Proceedings of the 27th ACM SIGKDD …, 2021 - dl.acm.org
Dynamic graph representation learning is a task to learn node embeddings over dynamic
networks, and has many important applications, including knowledge graphs, citation …

Mining the online infosphere: A survey

S Adak, S Chakraborty, P Das, M Das… - … : Data Mining and …, 2022 - Wiley Online Library
Abstract The evolution of Artificial Intelligence (AI)‐based systems and applications have
pervaded everyday life to make decisions that have a momentous impact on individuals and …

Wikipedia reader navigation: When synthetic data is enough

A Arora, M Gerlach, T Piccardi, A García-Durán… - Proceedings of the …, 2022 - dl.acm.org
Every day millions of people read Wikipedia. When navigating the vast space of available
topics using hyperlinks, readers describe trajectories on the article network. Understanding …

Cross-relation characterization of knowledge networks

EK Tokuda, R Lambiotte, LF Costa - The European Physical Journal B, 2023 - Springer
Abstract Knowledge networks are large, interconnected data sets of knowledge that can be
represented, studied and modeled using complex networks concepts and methodologies …

WikiHist. html: English Wikipedia's full revision history in HTML format

B Mitrevski, T Piccardi, R West - … of the International AAAI Conference on …, 2020 - ojs.aaai.org
Wikipedia is written in the wikitext markup language. When serving content, the MediaWiki
software that powers Wikipedia parses wikitext to HTML, thereby inserting additional content …