Operating systems and hypervisors for network functions: A survey of enabling technologies and research studies

AS Thyagaturu, P Shantharama, A Nasrallah… - IEEE …, 2022 - ieeexplore.ieee.org
Scalable and flexible communication networks increasingly conduct the packet processing
for Network Functions (NFs) in General Purpose Computing (GPC) platforms. The …

Advanced synchronization techniques for task-based runtime systems

D Álvarez, K Sala, M Maroñas, A Roca… - Proceedings of the 26th …, 2021 - dl.acm.org
Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow
execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient …

LAWS: Locality-aware work-stealing for multi-socket multi-core architectures

Q Chen, M Guo, H Guan - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
Modern mainstream powerful computers adopt Multi-Socket Multi-Core (MSMC) CPU
architecture and NUMA-based memory architecture. While traditional work-stealing …

Locality‐Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

A Muddukrishna, PA Jonsson… - Scientific …, 2015 - Wiley Online Library
Performance degradation due to nonuniform data access latencies has worsened on NUMA
systems and can now be felt on‐chip in manycore processors. Distributing data across …

Bandwidth and locality aware task-stealing for manycore architectures with bandwidth-asymmetric memory

H Zhao, Q Chen, Y Qiu, M Wu, Y Shen, J Leng… - ACM Transactions on …, 2018 - dl.acm.org
Parallel computers now start to adopt Bandwidth-Asymmetric Memory architecture that
consists of traditional DRAM memory and new High Bandwidth Memory (HBM) for high …

Contention and locality-aware work-stealing for iterative applications in multi-socket computers

Q Chen, M Guo - IEEE Transactions on Computers, 2017 - ieeexplore.ieee.org
Modern large-scale computers have shifted to Multi-socket Multi-core (MSMC) architectures,
where multiple CPU chips are integrated into a machine as sockets and multiple memory …

Locality-aware work stealing based on online profiling and auto-tuning for multisocket multicore architectures

Q Chen, M Guo - ACM Transactions on Architecture and Code …, 2015 - dl.acm.org
Modern mainstream powerful computers adopt multisocket multicore CPU architecture and
NUMA-based memory architecture. While traditional work-stealing schedulers are designed …

Constructing a logical tree topology in a parallel computer

CJ Archer, NJ KA, SS Sharkawi - US Patent 9,336,053, 2016 - Google Patents
Constructing a logical tree topology in a parallel computer that includes compute nodes,
where each node executes a number of tasks and at least one node executes a number of …

The C∀ Scheduler

T Delisle - 2022 - uwspace.uwaterloo.ca
User-Level threading (M: N) is gaining popularity over kernel-level threading (1: 1) in many
programming languages. The user threading approach is often a better mechanism to …

Constructing a logical tree topology in a parallel computer

CJ Archer, NJ KA, SS Sharkawi - US Patent 9,348,651, 2016 - Google Patents
Constructing a logical tree topology in a parallel computer that includes compute nodes,
where each node executes a number of tasks and at least one node executes a number of …