Mapping dense lu factorization on multicore supercomputer nodes

J Lifflander, NL Slattengren, PP Pébaÿ… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

This paper explores dynamic load balancing algorithms used by asynchronous many-task
(AMT), or 'taskbased', programming models to optimize task placement for scientific …

被引用次数：5 相关文章所有 4 个版本

Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition

M Perdacher, C Plant, C Böhm - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

The LU decomposition is an essential element used in many linear algebra applications.
Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core …

被引用次数：6 相关文章所有 3 个版本

[PDF] sciencedirect.com

Automatic translation of MPI source into a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska, D Quinlan… - Journal of Parallel and …, 2017 - Elsevier

Hiding communication behind useful computation is an important performance programming
technique but remains an inscrutable programming exercise even for the expert. We present …

被引用次数：6 相关文章所有 7 个版本

[PDF] uiuc.edu

Controlling concurrency and expressing synchronization in charm++ programs

LV Kale, J Lifflander - Concurrent Objects and Beyond: Papers dedicated …, 2014 - Springer

Charm++ is a parallel programming system that evolved over the past 20 years to become a
well-established system for programming parallel science and engineering applications, in …

被引用次数：7 相关文章所有 8 个版本

[PDF] ucsd.edu

Lu factorization: Towards hiding communication overheads with a lookahead-free algorithm

T Nguyen, SB Baden - 2015 IEEE International Conference on …, 2015 - ieeexplore.ieee.org

Lookahead is a well-known technique for masking communication in matrix factorization, but
at the cost of complicating application software. We present a new approach, based on …

被引用次数：5 相关文章所有 5 个版本

[PDF] cyberleninka.ru

Многоуровневые алгоритмы отображения параллельных МР1-программ на вычислительные кластеры

АА Пазников, МГ Курносов… - Проблемы …, 2015 - cyberleninka.ru

В работе рассматривается задача отображения параллельных MPI-программ на
иерархические кластерные вычислительные системы (ВС). Требуется по заданному …

被引用次数：3 相关文章所有 4 个版本

Implementation and analysis of block dense matrix decomposition on network-on-chips

TC Xu, T Pahikkala, A Airola, P Liljeberg… - 2012 IEEE 14th …, 2012 - ieeexplore.ieee.org

The decomposition of a dense matrix into lower and upper triangular matrices is an
important linear algebra kernel that used in scientific and engineering applications. To …

被引用次数：4 相关文章所有 4 个版本

[PDF] illinois.edu

Scalable algorithms for constructing balanced spanning trees on system-ranked process groups

A Langer, R Venkataraman, L Kale - Recent Advances in the Message …, 2012 - Springer

Current implementations of process groups (subcommunicators) have non-scalable (O
(group size)) memory footprints and even worse time complexities for setting up …

被引用次数：3 相关文章所有 13 个版本

Parsimonious renewable energy management policies for smart IoT devices

KR Islam, S Tabassum, T Adhikary… - 2016 5th International …, 2016 - ieeexplore.ieee.org

Energy scarcity at homes is becoming a critical issue due to exponential growth of energy
consumption by numerous smart home appliances. Renewable energy sources help to …

被引用次数：3 相关文章所有 2 个版本

[PDF] univie.ac.at

[PDF][PDF] Space-filling curves for improved cache-locality in shared memory environments

DIMA Perdacher - 2020 - phaidra.univie.ac.at

Today's microprocessors consist of multiple cores each of which can perform multiple
additions, multiplications, or other operations simultaneously in one clock cycle. In shared …

高级搜索

QQ 群