Simple hardware-efficient long convolutions for sequence modeling

DY Fu, EL Epstein, E Nguyen… - International …, 2023 - proceedings.mlr.press
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …

Monarch: Expressive structured matrices for efficient and accurate training

T Dao, B Chen, NS Sohoni, A Desai… - International …, 2022 - proceedings.mlr.press
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense …

Random features for kernel approximation: A survey on algorithms, theory, and beyond

F Liu, X Huang, Y Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …

Pixelated butterfly: Simple and efficient sparse training for neural network models

T Dao, B Chen, K Liang, J Yang, Z Song… - arXiv preprint arXiv …, 2021 - arxiv.org
Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …

Parameter-efficient orthogonal finetuning via butterfly factorization

W Liu, Z Qiu, Y Feng, Y Xiu, Y Xue, L Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large foundation models are becoming ubiquitous, but training them from scratch is
prohibitively expensive. Thus, efficiently adapting these powerful models to downstream …

Learning fast algorithms for linear transforms using butterfly factorizations

T Dao, A Gu, M Eichhorn, A Rudra… - … conference on machine …, 2019 - proceedings.mlr.press
Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier
transform, discrete cosine transform, and other structured transformations such as …

Accelerated linearized Laplace approximation for Bayesian deep learning

Z Deng, F Zhou, J Zhu - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Laplace approximation (LA) and its linearized variant (LLA) enable effortless adaptation of
pretrained deep neural networks to Bayesian neural networks. The generalized Gauss …

Neuralef: Deconstructing kernels by deep neural networks

Z Deng, J Shi, J Zhu - International Conference on Machine …, 2022 - proceedings.mlr.press
Learning the principal eigenfunctions of an integral operator defined by a kernel and a data
distribution is at the core of many machine learning problems. Traditional nonparametric …

Kaleidoscope: An efficient, learnable representation for all structured linear maps

T Dao, NS Sohoni, A Gu, M Eichhorn, A Blonder… - arXiv preprint arXiv …, 2020 - arxiv.org
Modern neural network architectures use structured linear transformations, such as low-rank
matrices, sparse matrices, permutations, and the Fourier transform, to improve inference …

Uni-fusion: Universal continuous mapping

Y Yuan, A Nüchter - IEEE Transactions on Robotics, 2024 - ieeexplore.ieee.org
We present Uni-Fusion, a universal continuous mapping framework for surfaces, surface
properties (color, infrared, etc.) and more (latent features in contrastive language-image …