A survey on distributed machine learning

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …

The future of computing beyond Moore's Law

J Shalf - Philosophical Transactions of the Royal Society …, 2020 - royalsocietypublishing.org
Moore's Law is a techno-economic model that has enabled the information technology
industry to double the performance and functionality of digital electronics roughly every 2 …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence

S Raschka, J Patterson, C Nolet - Information, 2020 - mdpi.com
Smarter applications are making better use of the insights gleaned from data, having an
impact on every industry and research discipline. At the core of this revolution lies the tools …

COIL: Revisit exact lexical match in information retrieval with contextualized inverted list

L Gao, Z Dai, J Callan - arXiv preprint arXiv:2104.07186, 2021 - arxiv.org
Classical information retrieval systems such as BM25 rely on exact lexical match and carry
out search efficiently with inverted list index. Recent neural IR models shifts towards soft …

[HTML][HTML] A suite of tutorials for the WESTPA rare-events sampling software [Article v1. 0]

AT Bogetti, B Mostofian, A Dickson… - Living journal of …, 2019 - ncbi.nlm.nih.gov
The weighted ensemble (WE) strategy has been demonstrated to be highly efficient in
generating pathways and rate constants for rare events such as protein folding and protein …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

[HTML][HTML] An improved chain of spheres for exchange algorithm

B Helmich-Paris, B de Souza, F Neese… - The Journal of Chemical …, 2021 - pubs.aip.org
In the present work, we describe a more accurate and efficient variant of the chain-of-
spheres algorithm (COSX) for exchange matrix computations. Higher accuracy for the …

Extensor: An accelerator for sparse tensor algebra

K Hegde, H Asghari-Moghaddam, M Pellauer… - Proceedings of the …, 2019 - dl.acm.org
Generalized tensor algebra is a prime candidate for acceleration via customized ASICs.
Modern tensors feature a wide range of data sparsity, with the density of non-zero elements …

qpOASES: A parametric active-set algorithm for quadratic programming

HJ Ferreau, C Kirches, A Potschka, HG Bock… - Mathematical …, 2014 - Springer
Many practical applications lead to optimization problems that can either be stated as
quadratic programming (QP) problems or require the solution of QP problems on a lower …