Dorylus: Affordable, scalable, and accurate {GNN} training with distributed {CPU} servers and serverless threads

J Thorpe, Y Qiao, J Eyolfson, S Teng, G Hu… - … USENIX Symposium on …, 2021 - usenix.org
A graph neural network (GNN) enables deep learning on structured graph data. There are
two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which …

Graphq: Scalable pim-based graph processing

Y Zhuo, C Wang, M Zhang, R Wang, D Niu… - Proceedings of the …, 2019 - dl.acm.org
Processing-In-Memory (PIM) architectures based on recent technology advances (eg,
Hybrid Memory Cube) demonstrate great potential for graph processing. However, existing …

Understanding and optimizing asynchronous low-precision stochastic gradient descent

C De Sa, M Feldman, C Ré, K Olukotun - Proceedings of the 44th annual …, 2017 - dl.acm.org
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in
machine learning and other domains. Since this is likely to continue for the foreseeable …

Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics

R Dathathri, G Gill, L Hoang, HV Dang… - Proceedings of the 39th …, 2018 - dl.acm.org
This paper introduces a new approach to building distributed-memory graph analytics
systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies …

Graphbolt: Dependency-driven synchronous processing of streaming graphs

M Mariappan, K Vora - … of the Fourteenth EuroSys Conference 2019, 2019 - dl.acm.org
Efficient streaming graph processing systems leverage incremental processing by updating
computed results to reflect the change in graph structure for the latest graph snapshot …

Peregrine: a pattern-aware graph mining system

K Jamshidi, R Mahadasa, K Vora - Proceedings of the Fifteenth …, 2020 - dl.acm.org
Graph mining workloads aim to extract structural properties of a graph by exploring its
subgraph structures. General purpose graph mining systems provide a generic runtime to …

Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations

K Vora, R Gupta, G Xu - Proceedings of the twenty-second international …, 2017 - dl.acm.org
Continuous processing of a streaming graph maintains an approximate result of the iterative
computation on a recent version of the graph. Upon a user query, the accurate result on the …

Graphpulse: An event-driven hardware accelerator for asynchronous graph processing

S Rahman, N Abu-Ghazaleh… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Graph processing workloads are memory intensive with irregular access patterns and large
memory footprint resulting in low data locality. Their popular software implementations …

DZiG: Sparsity-aware incremental processing of streaming graphs

M Mariappan, J Che, K Vora - … of the sixteenth European conference on …, 2021 - dl.acm.org
State-of-the-art streaming graph processing systems that provide Bulk Synchronous Parallel
(BSP) guarantees remain oblivious to the computation sparsity present in iterative graph …

Load the edges you need: A generic {I/O} optimization for disk-based graph processing

K Vora, G Xu, R Gupta - … Annual Technical Conference (USENIX ATC 16), 2016 - usenix.org
Single-PC, disk-based processing of big graphs has recently gained much popularity. At the
core of an efficient disk-based system is a well-designed partition structure that can minimize …