Affinity-based thread and data mapping in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Characterizing communication and page usage of parallel applications for thread and data mapping

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

kMAF: Automatic kernel-level management of thread and data affinity

M Diener, EHM Cruz, POA Navaux, A Busse… - Proceedings of the 23rd …, 2014 - dl.acm.org
One of the main challenges for parallel architectures is the increasing complexity of the
memory hierarchy, which consists of several levels of private and shared caches, as well as …

Scalable task parallelism for numa: A uniform abstraction for coordinated scheduling and memory management

A Drebes, A Pop, K Heydemann, A Cohen… - Proceedings of the 2016 …, 2016 - dl.acm.org
Dynamic task-parallel programming models are popular on shared-memory systems,
promising enhanced scalability, load balancing and locality. Yet these promises are …

Using memory access traces to map threads and data on hierarchical multi-core platforms

EHM da Cruz, MAZ Alves, A Carissimi… - … on Parallel and …, 2011 - ieeexplore.ieee.org
In parallel programs, the tasks of a given application must cooperate in order to accomplish
the required computation. However, the communication time between the tasks may be …

Locality vs. balance: Exploring data mapping policies on numa systems

M Diener, EHM Cruz… - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the
mapping of memory pages to NUMA nodes influences the performance of parallel …

A hierarchical approach for load balancing on parallel multi-core systems

LL Pilla, CP Ribeiro, D Cordeiro, C Mei… - 2012 41st …, 2012 - ieeexplore.ieee.org
Multi-core compute nodes with non-uniform memory access (NUMA) are now a common
architecture in the assembly of large-scale parallel machines. On these machines, in …

Kernel-based thread and data mapping for improved memory affinity

M Diener, EHM Cruz, MAZ Alves… - … on Parallel and …, 2015 - ieeexplore.ieee.org
Reducing the cost of memory accesses, both in terms of performance and energy
consumption, is a major challenge in shared-memory architectures. Modern systems have …

Optimizing machine learning algorithms on multi-core and many-core architectures using thread and data mapping

MS Serpa, AM Krause, EHM Cruz… - 2018 26th Euromicro …, 2018 - ieeexplore.ieee.org
Driven by the development of new technologies such as personal assistants or autonomous
cars, machine learning has rapidly become one of the most active fields in computer …

Hardware-assisted thread and data mapping in hierarchical multicore architectures

EHM Cruz, M Diener, LL Pilla… - ACM Transactions on …, 2016 - dl.acm.org
The performance and energy efficiency of modern architectures depend on memory locality,
which can be improved by thread and data mappings considering the memory access …