Automatic Generation of Distributed-Memory Mappings for Tensor Computations

M Kong, R Abu Yosef, A Rountev… - Proceedings of the …, 2023 - dl.acm.org
While considerable research has been directed at automatic parallelization for shared-
memory platforms, little progress has been made in automatic parallelization schemes for …

Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

K Abdelaal, M Kong - Proceedings of the ACM International Conference …, 2021 - dl.acm.org
Loop tiling is a key high-level transformation which is known to maximize locality in loop
intensive programs. It has been successfully applied to a number of applications including …

On the analysis of backscatter traffic

E Balkanli, AN Zincir-Heywood - 39th Annual IEEE Conference …, 2014 - ieeexplore.ieee.org
This work offers in-depth analysis of three different darknet datasets captured in 2004, 2006
and 2008 to provide insights into the nature of backscatter traffic. Moreover, we analyzed …

Pipes: a language and compiler for task-based programming on distributed-memory clusters

M Kong, LN Pouchet, P Sadayappan… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
Applications running on clusters of shared-memory computers are often implemented using
OpenMP+ MPI. Productivity can be vastly improved using task-based programming, a …

A Pipeline Pattern Detection Technique in Polly

D Talaashrafi, J Doerfert, M Moreno Maza - Workshop Proceedings of the …, 2022 - dl.acm.org
The polyhedral model has repeatedly shown how it facilitates various loop transformations,
including loop parallelization, loop tiling, and software pipelining. However, parallelism is …

TLP: Towards three‐level loop parallelisation

S Mahjoub, M Golsorkhtabaramiri… - IET Computers & …, 2022 - Wiley Online Library
Due to the design of computer systems in the multi‐core and/or multi‐processor form, it is
possible to use the maximum capacity of processors to run an application with the least time …

Intra-tile parallelization for two-level perfectly nested loops with non-uniform dependences

Z Abdi Reyhan, S Lotfi, A Isazadeh… - The Computer …, 2021 - academic.oup.com
Most important scientific and engineering applications have complex computations or large
data. In all these applications, a huge amount of time is consumed by nested loops …

Advances in the Automatic Detection of Optimization Opportunities in Computer Programs

D Talaashrafi - 2022 - search.proquest.com
Massively parallel and heterogeneous systems together with their APIs have been used for
various applications. To achieve high-performance software, the programmer should …

[HTML][HTML] Подход к оценке локальности зернистых вычислительных процессов, логически организованных в двумерную структуру

АА Толстиков, СВ Баханович… - Вестник Южно …, 2021 - cyberleninka.ru
При реализации алгоритмов на многопроцессорных вычислительных устройствах
важнейшую роль для достижения высокой производительности играет локальность …

Метод оценки локальности параллельных алгоритмов, ориентированных на компьютеры с распределенной памятью

НА Лиходед, АА Толстиков - … Национальной академии наук …, 2020 - doklady.belnauka.by
Аннотация Степень использования памяти с быстрым доступом отражает
вычислительное свойство алгоритма, называемое локальностью. Для параллельных …