Poster: A batched Cholesky solver for local RX anomaly detection on GPUs, 2013

A Abdelfattah, A Haidar, S Tomov… - … Conference, ISC High …, 2016 - Springer

The general matrix-matrix multiplication (GEMM) is the most important numerical kernel in
dense linear algebra, and is the key component for obtaining high performance in most …

被引用次数：146 相关文章所有 10 个版本

[PDF] osti.gov

RETRACTED: Batched matrix computations on hardware accelerators based on GPUs

A Haidar, T Dong, P Luszczek… - … Journal of High …, 2015 - journals.sagepub.com

Scientific applications require solvers that work on many small size problems that are
independent from each other. At the same time, the high-end hardware evolves rapidly and …

被引用次数：81 相关文章所有 17 个版本

[PDF] susu.ru

Parallel programming models for dense linear algebra on heterogeneous systems

J Dongarra, M Abalenkovs, A Abdelfattah… - Supercomputing …, 2015 - superfri.susu.ru

We present a review of the current best practices in parallel programming models for dense
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …

被引用次数：62 相关文章所有 16 个版本

[PDF] ieee.org

A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations

A Haidar, A Abdelfattah, M Zounon… - … on Parallel and …, 2017 - ieeexplore.ieee.org

We present a high-performance GPU kernel with a substantial speedup over vendor
libraries for very small matrix computations. In addition, we discuss most of the challenges …

被引用次数：26 相关文章所有 12 个版本

[PDF] google.com

[PDF][PDF] A proposed API for batched basic linear algebra subprograms

J Dongarra, I Duff, M Gates, A Haidar, S Hammarling… - 2016 - drive.google.com

This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched
BLAS). We focus on many independent BLAS operations on small matrices that are grouped …

被引用次数：34 相关文章所有 2 个版本

[PDF] scrvt.com

A process model to support continuous certification of cloud services

I Kunz, P Stephanow - 2017 IEEE 31st International …, 2017 - ieeexplore.ieee.org

Current research on cloud service certification is working on techniques to continuously, ie
automatically and repeatedly, assess whether cloud services satisfy certification criteria …

被引用次数：18 相关文章所有 4 个版本

[PDF] acm.org

Optimization for performance and energy for batched matrix computations on GPUs

A Haidar, T Dong, P Luszczek, S Tomov… - Proceedings of the 8th …, 2015 - dl.acm.org

As modern hardware keeps evolving, an increasingly effective approach to develop energy
efficient and high-performance solvers is to design them to work on many small size …

被引用次数：20 相关文章所有 6 个版本

[PDF] netlib.org

On the development of variable size batched computation for heterogeneous parallel architectures

A Abdelfattah, A Haidar, S Tomov… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Many scientific applications, ranging from national security to medical advances, require
solving a number of relatively small-size independent problems. As the size of each …

被引用次数：14 相关文章所有 5 个版本

[PDF] sciencedirect.com

Optimizing the SVD bidiagonalization process for a batch of small matrices

T Dong, A Haidar, S Tomov, J Dongarra - Procedia Computer Science, 2017 - Elsevier

A challenging class of problems arising in many GPU applications, called batched problems,
involves linear algebra operations on many small-sized matrices. We designed batched …

被引用次数：14 相关文章所有 6 个版本

[PDF] google.com

On the design, development, and analysis of optimized matrix-vector multiplication routines for coprocessors

K Kabir, A Haidar, S Tomov, J Dongarra - High Performance Computing …, 2015 - Springer

The manycore paradigm shift, and the resulting change in modern computer architectures,
has made the development of optimal numerical routines extremely challenging. In this …

被引用次数：13 相关文章所有 5 个版本

高级搜索

QQ 群