Few-to-many: Incremental parallelism for reducing tail latency in interactive services

ME Haque, YH Eom, Y He, S Elnikety, R Bianchini… - ACM Sigplan …, 2015 - dl.acm.org
Interactive services, such as Web search, recommendations, games, and finance, must
respond quickly to satisfy customers. Achieving this goal requires optimizing tail (eg, 99th+ …

Adaptive, efficient, parallel execution of parallel programs

S Sridharan, G Gupta, GS Sohi - Proceedings of the 35th ACM SIGPLAN …, 2014 - dl.acm.org
Future multicore processors will be heterogeneous, be increasingly less reliable, and
operate in dynamically changing operating conditions. Such environments will result in a …

Providing high‐level self‐adaptive abstractions for stream parallelism on multicores

A Vogel, D Griebler… - Software: practice and …, 2021 - Wiley Online Library
Stream processing applications are common computing workloads that demand parallelism
to increase their performance. As in the past, parallel programming remains a difficult task …

Work stealing for interactive services to meet target latency

J Li, K Agrawal, S Elnikety, Y He, ITA Lee, C Lu… - Proceedings of the 21st …, 2016 - dl.acm.org
Interactive web services increasingly drive critical business workloads such as search,
advertising, games, shopping, and finance. Whereas optimizing parallel programs and …

A portable, automatic data qantizer for deep neural networks

YH Oh, Q Quan, D Kim, S Kim, J Heo, S Jung… - Proceedings of the 27th …, 2018 - dl.acm.org
With the proliferation of AI-based applications and services, there are strong demands for
efficient processing of deep neural networks (DNNs). DNNs are known to be both compute …

Smart, adaptive mapping of parallelism in the presence of external workload

MK Emani, Z Wang, MFP O'Boyle - Proceedings of the 2013 …, 2013 - ieeexplore.ieee.org
Given the wide scale adoption of multi-cores in main stream computing, parallel programs
rarely execute in isolation and have to share the platform with other applications that …

Swift machine learning model serving scheduling: a region based reinforcement learning approach

H Qin, S Zawad, Y Zhou, L Yang, D Zhao… - Proceedings of the …, 2019 - dl.acm.org
The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS)-
deploying trained machine learning (ML) models in cloud to provide low latency inference …

Adaptive parallelism for web search

M Jeon, Y He, S Elnikety, AL Cox, S Rixner - Proceedings of the 8th …, 2013 - dl.acm.org
A web search query made to Microsoft Bing is currently parallelized by distributing the query
processing across many servers. Within each of these servers, the query is, however …

Holistic run-time parallelism management for time and energy efficiency

S Sridharan, G Gupta, GS Sohi - Proceedings of the 27th international …, 2013 - dl.acm.org
The ubiquity of parallel machines will necessitate time-and energy-efficient parallel
execution of a program in a wide range of hardware and software environments. Prevalent …

Parcae: a system for flexible parallel execution

A Raman, A Zaks, JW Lee, DI August - ACM SIGPLAN Notices, 2012 - dl.acm.org
Workload, platform, and available resources constitute a parallel program's execution
environment. Most parallelization efforts statically target an anticipated range of …