Dynamic task and data placement over NUMA architectures: an OpenMP runtime perspective

Hardware-accelerated platforms and infrastructures for network functions: A survey of enabling technologies and research studies

P Shantharama, AS Thyagaturu, M Reisslein - IEEE Access, 2020 - ieeexplore.ieee.org

In order to facilitate flexible network service virtualization and migration, network functions
(NFs) are increasingly executed by software modules as so-called “softwarized NFs” on …

被引用次数：88 相关文章所有 9 个版本

[PDF] nsf.gov

Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs

K Wu, J Ren, D Li - SC18: International Conference for High …, 2018 - ieeexplore.ieee.org

Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory.
Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM …

被引用次数：65 相关文章所有 9 个版本

[PDF] ethz.ch

HermitCore: a unikernel for extreme scale computing

S Lankes, S Pickartz, J Breitbart - … of the 6th International Workshop on …, 2016 - dl.acm.org

We expect that the size and the complexity of future supercomputers will increase on their
path to exascale systems and beyond. Therefore, system software has to adapt to the …

被引用次数：76 相关文章所有 7 个版本

[PDF] illinois.edu

Programming for exascale computers

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org

Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

被引用次数：67 相关文章所有 10 个版本

[PDF] hal.science

Structuring the execution of OpenMP applications for multicore architectures

F Broquedis, O Aumage, B Goglin… - … on Parallel & …, 2010 - ieeexplore.ieee.org

The now commonplace multi-core chips have introduced, by design, a deep hierarchy of
memory and cache banks within parallel computers as a tradeoff between the user …

被引用次数：86 相关文章所有 11 个版本

[PDF] hal.science

Faithful performance prediction of a dynamic task‐based runtime system for heterogeneous multi‐core architectures

L Stanisic, S Thibault, A Legrand… - Concurrency and …, 2015 - Wiley Online Library

Multi‐core architectures comprising several graphics processing units (GPUs) have become
mainstream in the field of high‐performance computing. However, obtaining the maximum …

被引用次数：55 相关文章所有 7 个版本

[PDF] ufpr.br

Using memory access traces to map threads and data on hierarchical multi-core platforms

EHM da Cruz, MAZ Alves, A Carissimi… - … on Parallel and …, 2011 - ieeexplore.ieee.org

In parallel programs, the tasks of a given application must cooperate in order to accomplish
the required computation. However, the communication time between the tasks may be …

被引用次数：59 相关文章所有 4 个版本

[PDF] wiley.com Full View

Locality‐Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

A Muddukrishna, PA Jonsson… - Scientific …, 2015 - Wiley Online Library

Performance degradation due to nonuniform data access latencies has worsened on NUMA
systems and can now be felt on‐chip in manycore processors. Distributing data across …

被引用次数：38 相关文章所有 9 个版本

Evaluating OpenMP Affinity on the POWER8 Architecture

S Pophale, O Hernandez - OpenMP: Memory, Devices, and Tasks: 12th …, 2016 - Springer

As we move toward pre-Exascale systems, two of the DOE leadership class systems will
consist of very powerful OpenPOWER compute nodes which will be more complex to …

被引用次数：36 相关文章

[PDF] jlifflander.com

Optimizing data locality for fork/join programs using constrained work stealing

J Lifflander, S Krishnamoorthy… - SC'14: Proceedings of …, 2014 - ieeexplore.ieee.org

We present an approach to improving data locality across different phases of fork/join
programs scheduled using work stealing. The approach consists of:(1) user-specified and …

被引用次数：40 相关文章所有 18 个版本

高级搜索

QQ 群