Multilevel algorithms for acyclic partitioning of directed acyclic graphs

J Herrmann, MY Ozkaya, B Uçar, K Kaya… - SIAM Journal on …, 2019 - SIAM
We investigate the problem of partitioning the vertices of a directed acyclic graph into a
given number of parts. The objective function is to minimize the number or the total weight of …

Time complexity of in-memory solution of linear systems

Z Sun, G Pedretti, P Mannocci… - … on Electron Devices, 2020 - ieeexplore.ieee.org
In-memory computing (IMC) with cross-point resistive memory arrays has been shown to
accelerate data-centric computations, such as the training and inference of deep neural …

Pebbles, graphs, and a pinch of combinatorics: Towards tight I/O lower bounds for statically analyzable programs

G Kwasniewski, T Ben-Nun, L Gianinazzi… - Proceedings of the 33rd …, 2021 - dl.acm.org
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel
algorithms, both across the memory hierarchy and between processors. Current approaches …

Parallel Loop Locality Analysis for Symbolic Thread Counts

F Liu, Y Zhu, S Sun, C Ding, W Smith… - Proceedings of the 2024 …, 2024 - dl.acm.org
Data movement limits program performance. This bottleneck is more significant in multi-
thread programs but more difficult to analyze, especially for multiple thread counts. For …

Acyclic partitioning of large directed acyclic graphs

J Herrmann, J Kho, B Uçar, K Kaya… - 2017 17th IEEE/ACM …, 2017 - ieeexplore.ieee.org
Finding a good partition of a computational directed acyclic graph associated with an
algorithm can help find an execution pattern improving data locality, conduct an analysis of …

Formal Verification of Source-to-Source Transformations for HLS

LN Pouchet, E Tucker, N Zhang, H Chen… - Proceedings of the …, 2024 - dl.acm.org
High-level synthesis (HLS) can greatly facilitate the description of complex hardware
implementations, by raising the level of abstraction up to a classical imperative language …

IOOpt: automatic derivation of i/o complexity bounds for affine programs

A Olivry, G Iooss, N Tollenaere, A Rountev… - Proceedings of the …, 2021 - dl.acm.org
Evaluating the complexity of an algorithm is an important step when developing
applications, as it impacts both its time and energy performance. Computational complexity …

Automated derivation of parametric data movement lower bounds for affine programs

A Olivry, J Langou, LN Pouchet… - Proceedings of the 41st …, 2020 - dl.acm.org
Researchers and practitioners have for long worked on improving the computational
complexity of algorithms, focusing on reducing the number of operations needed to perform …

Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach

S Pouget, LN Pouchet, J Cong - arXiv preprint arXiv:2405.12304, 2024 - arxiv.org
High-level synthesis, source-to-source compilers, and various Design Space Exploration
techniques for pragma insertion have significantly improved the Quality of Results of …

Finpar: A parallel financial benchmark

C Andreetta, V Bégot, J Berthold, M Elsman… - ACM Transactions on …, 2016 - dl.acm.org
Commodity many-core hardware is now mainstream, but parallel programming models are
still lagging behind in efficiently utilizing the application parallelism. There are (at least) two …