Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions

A Cichocki, N Lee, I Oseledets, AH Phan… - … and Trends® in …, 2016 - nowpublishers.com
Modern applications in engineering and data science are increasingly based on
multidimensional data of exceedingly high volume, variety, and structural richness …

Low-rank tensor networks for dimensionality reduction and large-scale optimization problems: Perspectives and challenges part 1

A Cichocki, N Lee, IV Oseledets, AH Phan… - arXiv preprint arXiv …, 2016 - arxiv.org
Machine learning and data mining algorithms are becoming increasingly important in
analyzing large volume, multi-relational and multi--modal datasets, which are often …

Many-body quantum chemistry on massively parallel computers

JA Calvin, C Peng, V Rishi, A Kumar… - Chemical …, 2020 - ACS Publications
The deployment of many-body quantum chemistry methods onto massively parallel high-
performance computing (HPC) platforms is reviewed. The particular focus is on highly …

Design of a high-performance GEMM-like tensor–tensor multiplication

P Springer, P Bientinesi - ACM Transactions on Mathematical Software …, 2018 - dl.acm.org
We present “GEMM-like Tensor–Tensor multiplication”(GETT), a novel approach for dense
tensor contractions that mirrors the design of a high-performance general matrix–matrix …

High-performance tensor contraction without transposition

DA Matthews - SIAM Journal on Scientific Computing, 2018 - SIAM
Tensor computations---in particular tensor contraction (TC)---are important kernels in many
scientific computing applications. Due to the fundamental similarity of TC to matrix …

Tensor contractions with extended BLAS kernels on CPU and GPU

Y Shi, UN Niranjan, A Anandkumar… - 2016 IEEE 23rd …, 2016 - ieeexplore.ieee.org
Tensor contractions constitute a key computational ingredient of numerical multi-linear
algebra. However, as the order and dimension of tensors grow, the time and space …

High-performance generalized tensor operations: A compiler-oriented approach

R Gareev, T Grosser, M Kruse - ACM Transactions on Architecture and …, 2018 - dl.acm.org
The efficiency of tensor contraction is of great importance. Compilers cannot optimize it well
enough to come close to the performance of expert-tuned implementations. All existing …

Optimizing sparse tensor times matrix on GPUs

Y Ma, J Li, X Wu, C Yan, J Sun, R Vuduc - Journal of Parallel and …, 2019 - Elsevier
This work optimizes tensor-times-dense matrix multiply (Ttm) for general sparse and semi-
sparse tensors on CPU and NVIDIA GPU platforms. Ttm is a computational kernel in tensor …

Optimizing sparse tensor times matrix on multi-core and many-core architectures

J Li, Y Ma, C Yan, R Vuduc - 2016 6th Workshop on Irregular …, 2016 - ieeexplore.ieee.org
This paper presents the optimized design and implementation of sparse tensor-times-dense
matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in …

Efficient molecular quantum dynamics in coordinate and phase space using pruned bases

HR Larsson, B Hartke, DJ Tannor - The Journal of Chemical Physics, 2016 - pubs.aip.org
We present an efficient implementation of dynamically pruned quantum dynamics, both in
coordinate space and in phase space. We combine the ideas behind the biorthogonal von …