Einops: Clear and reliable tensor manipulations with einstein-like notation

A Rogozhnikov - International Conference on Learning …, 2021 - openreview.net
Tensor computations underlie modern scientific computing and deep learning. A number of
tensor frameworks emerged varying in execution model, hardware support, memory …

[HTML][HTML] A methodology for comparing optimization algorithms for auto-tuning

FJ Willemsen, R Schoonhoven, J Filipovič… - Future Generation …, 2024 - Elsevier
Adapting applications to optimally utilize available hardware is no mean feat: the plethora of
choices for optimization techniques are infeasible to maximize manually. To this end, auto …

Negative perceptions about the applicability of source-to-source compilers in hpc: A literature review

R Milewicz, P Pirkelbauer, P Soundararajan… - … Computing: ISC High …, 2021 - Springer
A source-to-source compiler is a type of translator that accepts the source code of a program
written in a programming language as its input and produces an equivalent source code in …

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

K Parasyris, G Georgakoudis, E Rangel… - Proceedings of the …, 2023 - dl.acm.org
HPC is a heterogeneous world in which host and device code are interleaved throughout
the application. Given the significant performance advantage of accelerators, device code …

DynaSOAr: a parallel memory allocator for object-oriented programming on GPUs with efficient memory access

M Springer, H Masuhara - arXiv preprint arXiv:1810.11765, 2018 - arxiv.org
Object-oriented programming has long been regarded as too inefficient for SIMD high-
performance computing, despite the fact that many important HPC applications have an …

Prediction of multicore CPU performance through parallel data mining on public datasets

NM Upadhyay, RS Singh, SP Dwivedi - Displays, 2022 - Elsevier
In the present scenario, high-performance computing needs more attention towards
multicore computing. While designing the CPU, we need to consider hardware for …

GPU acceleration of range queries over large data sets

M Nelson, Z Sorenson, JM Myre, J Sawin… - Proceedings of the 6th …, 2019 - dl.acm.org
Data management systems commonly use bitmap indices to increase the efficiency of
querying scientific data. Bitmaps are usually highly compressible and can be queried …

Dependency Prediction of Long-Time Resource Uses in HPC Environment

NM Upadhyay, RS Singh, SP Dwivedi - IEEE Access, 2023 - ieeexplore.ieee.org
High-Performance computing provides a new infrastructure for scientific calculation and its
simulation. However, unbalanced load distribution among the processors causes a …

Parallel acceleration of CPU and GPU range queries over large data sets

M Nelson, Z Sorenson, JM Myre, J Sawin… - Journal of Cloud …, 2020 - Springer
Data management systems commonly use bitmap indices to increase the efficiency of
querying scientific data. Bitmaps are usually highly compressible and can be queried …

Block-size independence for GPU programs

R Alur, J Devietti, N Singhania - … , SAS 2018, Freiburg, Germany, August 29 …, 2018 - Springer
Optimizing GPU programs by tuning execution parameters is essential to realizing the full
performance potential of GPU hardware. However, many of these optimizations do not …