GNNLab: a factored system for sample-based GNN training over GPUs

J Yang, D Tang, X Song, L Wang, Q Yin… - Proceedings of the …, 2022 - dl.acm.org
We propose GNNLab, a sample-based GNN training system in a single machine multi-GPU
setup. GNNLab adopts a factored design for multiple GPUs, where each GPU is dedicated to …

{mTCP}: a highly scalable user-level {TCP} stack for multicore systems

EY Jeong, S Wood, M Jamshed, H Jeong… - … USENIX Symposium on …, 2014 - usenix.org
Scaling the performance of short TCP connections on multicore systems is fundamentally
challenging. Although many proposals have attempted to address various shortcomings …

The multikernel: a new OS architecture for scalable multicore systems

A Baumann, P Barham, PE Dagand, T Harris… - Proceedings of the …, 2009 - dl.acm.org
Commodity computer systems contain more and more processor cores and exhibit
increasingly diverse architectural tradeoffs, including memory hierarchies, interconnects …

Tales of the tail: Hardware, os, and application-level sources of tail latency

J Li, NK Sharma, DRK Ports, SD Gribble - Proceedings of the ACM …, 2014 - dl.acm.org
Interactive services often have large-scale parallel implementations. To deliver fast
responses, the median and tail latencies of a service's components must be low. In this …

Energy efficient allocation of virtual machines in cloud data centers

A Beloglazov, R Buyya - … on Cluster, Cloud and Grid Computing, 2010 - ieeexplore.ieee.org
Rapid growth of the demand for computational power has led to the creation of large-scale
data centers. They consume enormous amounts of electrical power resulting in high …

[PDF][PDF] An analysis of Linux scalability to many cores

S Boyd-Wickizer, AT Clements, Y Mao… - … USENIX Symposium on …, 2010 - usenix.org
This paper analyzes the scalability of seven system applications (Exim, memcached,
Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48-core …

Everything you always wanted to know about synchronization but were afraid to ask

T David, R Guerraoui, V Trigonakis - Proceedings of the Twenty-Fourth …, 2013 - dl.acm.org
This paper presents the most exhaustive study of synchronization to date. We span multiple
layers, from hardware cache-coherence protocols up to high-level concurrent software. We …

[PDF][PDF] {FlexSC}: Flexible system call scheduling with {Exception-Less} system calls

L Soares, M Stumm - 9th USENIX Symposium on Operating Systems …, 2010 - usenix.org
For the past 30+ years, system calls have been the de facto interface used by applications to
request services from the operating system kernel. System calls have almost universally …

[PDF][PDF] Corey: An Operating System for Many Cores.

S Boyd-Wickizer, H Chen, R Chen, Y Mao… - OSDI, 2008 - usenix.org
Multiprocessor application performance can be limited by the operating system when the
application uses the operating system frequently and the operating system services use data …

The scalable commutativity rule: Designing scalable software for multicore processors

AT Clements, MF Kaashoek, N Zeldovich… - ACM Transactions on …, 2015 - dl.acm.org
What opportunities for multicore scalability are latent in software interfaces, such as system
call APIs? Can scalability challenges and opportunities be identified even before any …