In-memory computing (IMC) with cross-point resistive memory arrays has been shown to accelerate data-centric computations, such as the training and inference of deep neural …
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches …
F Liu, Y Zhu, S Sun, C Ding, W Smith… - Proceedings of the 2024 …, 2024 - dl.acm.org
Data movement limits program performance. This bottleneck is more significant in multi- thread programs but more difficult to analyze, especially for multiple thread counts. For …
Finding a good partition of a computational directed acyclic graph associated with an algorithm can help find an execution pattern improving data locality, conduct an analysis of …
High-level synthesis (HLS) can greatly facilitate the description of complex hardware implementations, by raising the level of abstraction up to a classical imperative language …
A Olivry, G Iooss, N Tollenaere, A Rountev… - Proceedings of the …, 2021 - dl.acm.org
Evaluating the complexity of an algorithm is an important step when developing applications, as it impacts both its time and energy performance. Computational complexity …
A Olivry, J Langou, LN Pouchet… - Proceedings of the 41st …, 2020 - dl.acm.org
Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform …
High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of …
C Andreetta, V Bégot, J Berthold, M Elsman… - ACM Transactions on …, 2016 - dl.acm.org
Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two …