A methodology for efficient tile size selection for affine loop kernels

V Kelefouras, K Djemame, G Keramidas… - International Journal of …, 2022 - Springer
Reducing the number of data accesses in memory hierarchy is of paramount importance on
modern computer systems. One of the key optimizations addressing this problem is loop …

The fastest Fourier transform in the south

AM Blake, IH Witten, MJ Cree - IEEE transactions on signal …, 2013 - ieeexplore.ieee.org
This paper describes FFTS, a discrete Fourier transform (DFT) library that achieves state-of-
the-art performance using a new cache-oblivious algorithm implemented with run-time …

An ultra-long FFT architecture implemented in a reconfigurable application specified processor

F Han, L Li, K Wang, F Feng, H Pan, B Zhang… - IEICE Electronics …, 2016 - jstage.jst.go.jp
This paper presents an efficient architecture for performing 128 points to 1M points Fast
Fourier Transformation (FFT) based on mixed radix-2/4/8 butterfly unit. The proposed FFT …

Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage

M Bahtat, S Belkouch, P Elleaume, P Le Gall - EURASIP Journal on …, 2016 - Springer
The fast Fourier transform (FFT) is perhaps today's most ubiquitous algorithm used with
digital data; hence, it is still being studied extensively. Besides the benefit of reducing the …

Computing the fast Fourier transform on SIMD microprocessors

AM Blake - 2012 - researchcommons.waikato.ac.nz
This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two
length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very …

A methodology for speeding up edge and line detection algorithms focusing on memory architecture utilization

V Kelefouras, A Kritikakou, C Goutis - The Journal of Supercomputing, 2014 - Springer
In this paper, a new methodology for speeding up edge and line detection algorithms is
presented, achieving improved performance over the state of the art software library …

An analytical model for loop tiling transformation

V Kelefouras, K Djemame, G Keramidas… - … Conference on Embedded …, 2021 - Springer
Loop tiling is a well-known loop transformation that enhances data locality in memory
hierarchy. In this paper, we initially reveal two important inefficiencies of current analytical …

A methodology for speeding up mvm for regular, toeplitz and bisymmetric toeplitz matrices

VI Kelefouras, AS Kritikakou, K Siourounis… - Journal of Signal …, 2014 - Springer
Abstract The Matrix Vector Multiplication algorithm is an important kernel in most varied
domains and application areas and the performance of its implementations highly depends …

A methodology for speeding up loop kernels by exploiting the software information and the memory architecture

V Kelefouras, A Kritikakou, C Goutis - Computer Languages, Systems & …, 2015 - Elsevier
It is well-known that today׳ s compilers and state of the art libraries have three major
drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient …

Adaptation du calcul de la Transformée de Fourier Rapide sur une architecture mixte CPU/GPU intégrée

MA Bergach - 2015 - inria.hal.science
Les architectures multi-cœurs Intel Core (IvyBridge, Haswell,...) contiennent à la fois des
cœurs CPU généralistes (4), mais aussi des cœurs dédiés GPU embarqués sur cette même …