Primal-attention: Self-attention through asymmetric kernel svd in primal representation

Y Chen, Q Tao, F Tonin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …

Understanding neural networks with reproducing kernel Banach spaces

F Bartolucci, E De Vito, L Rosasco… - Applied and Computational …, 2023 - Elsevier
Characterizing the function spaces corresponding to neural networks can provide a way to
understand their properties. In this paper we discuss how the theory of reproducing kernel …

Physics-informed Gaussian process regression generalizes linear PDE solvers

M Pförtner, I Steinwart, P Hennig, J Wenger - arXiv preprint arXiv …, 2022 - arxiv.org
Linear partial differential equations (PDEs) are an important, widely applied class of
mechanistic models, describing physical processes such as heat transfer, electromagnetism …

Sparse machine learning in Banach spaces

Y Xu - Applied Numerical Mathematics, 2023 - Elsevier
The aim of this expository paper is to explain to graduate students and beginning
researchers in the field of mathematics, statistics and engineering the fundamental concept …

Duality for neural networks through reproducing kernel Banach spaces

L Spek, TJ Heeringa, F Schwenninger… - arXiv preprint arXiv …, 2022 - arxiv.org
Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in various
areas of machine learning. Recently, Barron spaces have been used to prove bounds on the …

Koopman and Perron–Frobenius operators on reproducing kernel Banach spaces

M Ikeda, I Ishikawa, C Schlosser - Chaos: An Interdisciplinary Journal …, 2022 - pubs.aip.org
Koopman and Perron–Frobenius operators for dynamical systems are becoming popular in
a number of fields in science recently. Properties of the Koopman operator essentially …

Nonlinear SVD with Asymmetric Kernels: feature learning and asymmetric Nystr\" om method

Q Tao, F Tonin, P Patrinos, JAK Suykens - arXiv preprint arXiv:2306.07040, 2023 - arxiv.org
Asymmetric data naturally exist in real life, such as directed graphs. Different from the
common kernel methods requiring Mercer kernels, this paper tackles the asymmetric kernel …

Learning with asymmetric kernels: Least squares and feature interpretation

M He, F He, L Shi, X Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Asymmetric kernels naturally exist in real life, eg, for conditional probability and directed
graphs. However, most of the existing kernel-based learning methods require kernels to be …

Transformers are deep infinite-dimensional non-mercer binary kernel machines

MA Wright, JE Gonzalez - arXiv preprint arXiv:2106.01506, 2021 - arxiv.org
Despite their ubiquity in core AI fields like natural language processing, the mechanics of
deep attention-based neural networks like the Transformer model are not fully understood …

Augmented balancing weights as linear regression

D Bruns-Smith, O Dukes, A Feller… - arXiv preprint arXiv …, 2023 - arxiv.org
We provide a novel characterization of augmented balancing weights, also known as
automatic debiased machine learning (AutoDML). These popular doubly robust or double …