Hardware-accelerated platforms and infrastructures for network functions: A survey of enabling technologies and research studies

P Shantharama, AS Thyagaturu, M Reisslein - IEEE Access, 2020 - ieeexplore.ieee.org
In order to facilitate flexible network service virtualization and migration, network functions
(NFs) are increasingly executed by software modules as so-called “softwarized NFs” on …

Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs

K Wu, J Ren, D Li - SC18: International Conference for High …, 2018 - ieeexplore.ieee.org
Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory.
Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM …

HermitCore: a unikernel for extreme scale computing

S Lankes, S Pickartz, J Breitbart - … of the 6th International Workshop on …, 2016 - dl.acm.org
We expect that the size and the complexity of future supercomputers will increase on their
path to exascale systems and beyond. Therefore, system software has to adapt to the …

Programming for exascale computers

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

Structuring the execution of OpenMP applications for multicore architectures

F Broquedis, O Aumage, B Goglin… - … on Parallel & …, 2010 - ieeexplore.ieee.org
The now commonplace multi-core chips have introduced, by design, a deep hierarchy of
memory and cache banks within parallel computers as a tradeoff between the user …

Faithful performance prediction of a dynamic task‐based runtime system for heterogeneous multi‐core architectures

L Stanisic, S Thibault, A Legrand… - Concurrency and …, 2015 - Wiley Online Library
Multi‐core architectures comprising several graphics processing units (GPUs) have become
mainstream in the field of high‐performance computing. However, obtaining the maximum …

Using memory access traces to map threads and data on hierarchical multi-core platforms

EHM da Cruz, MAZ Alves, A Carissimi… - … on Parallel and …, 2011 - ieeexplore.ieee.org
In parallel programs, the tasks of a given application must cooperate in order to accomplish
the required computation. However, the communication time between the tasks may be …

Locality‐Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

A Muddukrishna, PA Jonsson… - Scientific …, 2015 - Wiley Online Library
Performance degradation due to nonuniform data access latencies has worsened on NUMA
systems and can now be felt on‐chip in manycore processors. Distributing data across …

Evaluating OpenMP Affinity on the POWER8 Architecture

S Pophale, O Hernandez - OpenMP: Memory, Devices, and Tasks: 12th …, 2016 - Springer
As we move toward pre-Exascale systems, two of the DOE leadership class systems will
consist of very powerful OpenPOWER compute nodes which will be more complex to …

Optimizing data locality for fork/join programs using constrained work stealing

J Lifflander, S Krishnamoorthy… - SC'14: Proceedings of …, 2014 - ieeexplore.ieee.org
We present an approach to improving data locality across different phases of fork/join
programs scheduled using work stealing. The approach consists of:(1) user-specified and …