Performance, design, and autotuning of batched GEMM for GPUs A Abdelfattah, A Haidar, S Tomov, J Dongarra High Performance Computing: 31st International Conference, ISC High …, 2016 | 133 | 2016 |
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic A Abdelfattah, H Anzt, EG Boman, E Carson, T Cojean, J Dongarra, A Fox, ... The International Journal of High Performance Computing Applications 35 (4 …, 2021 | 119 | 2021 |
High-performance tensor contractions for GPUs A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ... Procedia Computer Science 80, 108-118, 2016 | 76 | 2016 |
High-performance matrix-matrix multiplications of very small matrices I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ... Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016 | 69 | 2016 |
Parallel programming models for dense linear algebra on heterogeneous systems J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ... Supercomputing frontiers and innovations 2 (4), 67-86, 2015 | 62 | 2015 |
Kblas: An optimized library for dense matrix-vector multiplication on gpu accelerators A Abdelfattah, D Keyes, H Ltaief ACM Transactions on Mathematical Software (TOMS) 42 (3), 1-31, 2016 | 56 | 2016 |
Efficient exascale discretizations: High-order finite element methods T Kolev, P Fischer, M Min, J Dongarra, J Brown, V Dobrev, T Warburton, ... The International Journal of High Performance Computing Applications 35 (6 …, 2021 | 51 | 2021 |
The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques A Haidar, A Abdelfattah, M Zounon, P Wu, S Pranesh, S Tomov, ... International conference on computational science, 586-600, 2018 | 51 | 2018 |
A survey of numerical methods utilizing mixed precision arithmetic A Abdelfattah, H Anzt, EG Boman, E Carson, T Cojean, J Dongarra, ... arXiv preprint arXiv:2007.06674, 2020 | 46 | 2020 |
With extreme computing, the rules have changed J Dongarra, S Tomov, P Luszczek, J Kurzak, M Gates, I Yamazaki, H Anzt, ... Computing in Science & Engineering 19 (3), 52-62, 2017 | 45 | 2017 |
Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs A Abdelfattah, S Tomov, J Dongarra 2019 IEEE international parallel and distributed processing symposium (IPDPS …, 2019 | 42 | 2019 |
A novel fast and accurate pseudo-analytical simulation approach for MOAO É Gendron, A Charara, A Abdelfattah, D Gratadour, D Keyes, H Ltaief, ... Adaptive Optics Systems IV 9148, 2148-2160, 2014 | 38 | 2014 |
A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations A Haidar, A Abdelfattah, M Zounon, S Tomov, J Dongarra IEEE Transactions on Parallel and Distributed Systems 29 (5), 973-984, 2017 | 26 | 2017 |
A set of batched basic linear algebra subprograms and LAPACK routines A Abdelfattah, T Costa, J Dongarra, M Gates, A Haidar, S Hammarling, ... ACM Transactions on Mathematical Software (TOMS) 47 (3), 1-23, 2021 | 25 | 2021 |
GPU algorithms for efficient exascale discretizations A Abdelfattah, V Barra, N Beams, R Bleile, J Brown, JS Camier, R Carson, ... Parallel Computing 108, 102841, 2021 | 24 | 2021 |
Design, optimization, and benchmarking of dense linear algebra algorithms on AMD GPUs C Brown, A Abdelfattah, S Tomov, J Dongarra 2020 IEEE High Performance Extreme Computing Conference (HPEC), 1-7, 2020 | 24 | 2020 |
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ... Parallel Computing 81, 1-21, 2019 | 24 | 2019 |
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs A Abdelfattah, A Haidar, S Tomov, J Dongarra Proceedings of the International Conference on Supercomputing, 1-10, 2017 | 24 | 2017 |
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA A Abdelfattah, A Haidar, S Tomov, J Dongarra Journal of Computational Science 20, 85-93, 2017 | 24 | 2017 |
Evaluating the performance of NVIDIA’s A100 Ampere GPU for sparse and batched computations H Anzt, YM Tsai, A Abdelfattah, T Cojean, J Dongarra 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High …, 2020 | 23 | 2020 |