Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and …
We present a review of the current best practices in parallel programming models for dense linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …
A Haidar, A Abdelfattah, M Zounon… - … on Parallel and …, 2017 - ieeexplore.ieee.org
We present a high-performance GPU kernel with a substantial speedup over vendor libraries for very small matrix computations. In addition, we discuss most of the challenges …
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on many independent BLAS operations on small matrices that are grouped …
I Kunz, P Stephanow - 2017 IEEE 31st International …, 2017 - ieeexplore.ieee.org
Current research on cloud service certification is working on techniques to continuously, ie automatically and repeatedly, assess whether cloud services satisfy certification criteria …
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size …
A Abdelfattah, A Haidar, S Tomov… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Many scientific applications, ranging from national security to medical advances, require solving a number of relatively small-size independent problems. As the size of each …
A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched …
K Kabir, A Haidar, S Tomov, J Dongarra - High Performance Computing …, 2015 - Springer
The manycore paradigm shift, and the resulting change in modern computer architectures, has made the development of optimal numerical routines extremely challenging. In this …